Search This Blog

Friday, November 14, 2008

About Datasets and rules in Data Mining

  • The very basis of Data Mining is Datasets. Datasets are huge in size.
  • In a dataset are characterized by the values of features, or attributes, that measure different aspects of the instance.
  • Attributes could have values that are symbolic categories or numeric values
  • Rules are generated from attributes and are meant to be interpreted in order.
  • A set of rules that are intended to be interpreted in sequence is called a decision list.
  • Interpreted as a decision list, the rules help classify correctly ,whereas taken individually, out of context, some of the rules could be incorrect.
  • any learning method must create simple equality tests involving attributes when the attribute values are symbolic categories. However, when attribute values are numeric, the learning method must create inequalities. This is referred to as numeric-attribute problem or if only some of the attributes are numeric, it is referred to as the mixed-attribute problem
  • Rules could be categorized as classification rules and association rules.
  • Classification rules help predict a decision based on the decision list whereas association rules strongly associate different attribute values.
  • rules could complete and deterministic: in such cases, they give a unique prescription for
    every conceivable case. Generally, this is not the case. Sometimes there are situations in which no rule applies; other times more than one rule may apply, resulting in conflicting recommendations. Sometimes probabilities or weights may be associated with the rules themselves to indicate that some are more important, or more reliable, than others
  • a decision tree is a more concise and perspicuous representation of the rules and has the advantage that it can be visualized more easily.

No comments: