好文档就是一把金锄头!
欢迎来到金锄头文库![会员中心]
电子文档交易市场
安卓APP | ios版本
电子文档交易市场
安卓APP | ios版本

人工智能与数据挖掘教学l3.ppt

26页
  • 卖家[上传人]:夏**
  • 文档编号:592726862
  • 上传时间:2024-09-22
  • 文档格式:PPT
  • 文档大小:464.50KB
  • / 26 举报 版权申诉 马上下载
  • 文本预览
  • 下载提示
  • 常见问题
    • Chapter 3 Basic Data Mining Techniques3.1 Decision Trees (For classification)2024/9/221编辑ppt Introduction: Classification—A Two-Step Process •1. Model construction: build a model that can describe a set of predetermined classes–Preparation: Each tuple/sample is assumed to belong to a predefined class, labeled by the output attribute or class label attribute–This set of examples is used for model construction: training set–The model can be represented as classification rules, decision trees, or mathematical formulae –Estimate accuracy of the model•The known label of test sample is compared with the classified result from the model•Accuracy rate is the percentage of testing set samples that are correctly classified by the model•Note: Test set is independent of training set, otherwise over-fitting will occur•2. Model usage: use the model to classify future or unknown objects2024/9/222编辑ppt Classification Process (1): Model ConstructionTrainingDataClassificationAlgorithmsIF rank = ‘professor’OR years > 6THEN tenured = ‘yes’ Classifier(Model)2024/9/223编辑ppt Classification Process (2): Use the Model in PredictionClassifierTestingDataUnseen Data(Jeff, Professor, 4)Tenured?编辑ppt 1 Example (1): Training DatasetAn example from Quinlan’s ID3(1986)2024/9/225编辑ppt 1 Example (2): Output: A Decision Tree for “buys_computer”age?overcaststudent?credit rating?noyesfairexcellent<=30>40nonoyesyesyes30..402024/9/226编辑ppt 2 Algorithm for Decision Tree Building•Basic algorithm (a greedy algorithm)–Tree is constructed in a top-down recursive divide-and-conquer manner–At start, all the training examples are at the root –Attributes are categorical (if continuous-valued, they are discretized in advance) –Examples are partitioned recursively based on selected attributes–Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)•Conditions for stopping partitioning–All samples for a given node belong to the same class–There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf–There are no samples left–Reach the pre-set accuracy2024/9/227编辑ppt Information Gain (信息增益)(ID3/C4.5)•Select the attribute with the highest information gain•Assume there are two classes, P and N–Let the set of examples S contain p elements of class P and n elements of class N–The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as2024/9/228编辑ppt Information Gain in Decision Tree Building•Assume that using attribute A, a set S will be partitioned into sets {S1, S2 , …, Sv} –If Si contains pi examples of P and ni examples of N, the entropy (熵), or the expected information needed to classify objects in all subsets Si is•The encoding information that would be gained by branching on A2024/9/229编辑ppt Attribute Selection by Information Gain ComputationgClass P: buys_computer = “yes”gClass N: buys_computer = “no”gCompute the entropy for age:HenceSimilarly2024/9/2210编辑ppt 3. Decision Tree Rules •Automate rule creation•Rules simplification and elimination•A default rule is chosen2024/9/2211编辑ppt 3.1 Extracting Classification Rules from Trees•Represent the knowledge in the form of IF-THEN rules•One rule is created for each path from the root to a leaf•Rules are easier for humans to understand•ExampleIF age = “<=30” AND student = “no” THEN buys_computer = “no”IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”IF age = “31…40” THEN buys_computer = “yes”IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”2024/9/2212编辑ppt IF Age <=43 & Sex = Male & Credit Card Insurance = NoTHEN Life Insurance Promotion = No(accuracy = 75%, Figure 3.4)A Simplified Rule Obtained by Removing Attribute AgeIF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No(accuracy = 83.3% (5/6), Figure 3.5)3.2 Rules simplification and elimination2024/9/2213编辑ppt Figure 3.5 A two-node decision tree for the credit card database编辑ppt 2024/9/2215编辑ppt 4. Further discussion•Attributes with more values accuracy / splits GainRatio(A) = Gain(A) / SplitInfo(A)•Numerical attributes binary split•Stopping condition•More than 2 values•Other Methods for building decision trees•ID3•C4.5 •CART• CHAID2024/9/2216编辑ppt 5. General consideration:Advantages of Decision Trees• Easy to understand.• Map nicely to a set of production rules.• Applied to real problems.• Make no prior assumptions about the data.• Able to process both numerical and categorical data.2024/9/2217编辑ppt Disadvantages of Decision Trees• Output attribute must be categorical. • Limited to one output attribute.• Decision tree algorithms are unstable.• Trees created from numeric datasets can be complex.2024/9/2218编辑ppt Decision Tree Attribute SelectionAppendix C2024/9/2219编辑ppt Computing Gain Ratio编辑ppt Computing Gain(A)编辑ppt Computing Info(I)编辑ppt Computing Info(I,A)编辑ppt Computing Split Info(A)编辑ppt 编辑ppt 编辑ppt 。

      点击阅读更多内容
      关于金锄头网 - 版权申诉 - 免责声明 - 诚邀英才 - 联系我们
      手机版 | 川公网安备 51140202000112号 | 经营许可证(蜀ICP备13022795号)
      ©2008-2016 by Sichuan Goldhoe Inc. All Rights Reserved.