您所在位置：网站首页 > 办公文档 > 工作计划 > 人工智能与数据挖掘教学l3

人工智能与数据挖掘教学l3.ppt

26页

卖家[上传人]：夏**

文档编号：592726862

上传时间：2024-09-22

文档格式：PPT

文档大小：464.50KB

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

20金贝

下载

/ 26 举报版权申诉马上下载

文本预览

下载提示

常见问题

Chapter 3 Basic Data Mining Techniques3.1 Decision Trees (For classification)2024/9/221编辑pptIntroduction: Classification—A Two-Step Process •1. Model construction: build a model that can describe a set of predetermined classes–Preparation: Each tuple/sample is assumed to belong to a predefined class, labeled by the output attribute or class label attribute–This set of examples is used for model construction: training set–The model can be represented as classification rules, decision trees, or mathematical formulae –Estimate accuracy of the model•The known label of test sample is compared with the classified result from the model•Accuracy rate is the percentage of testing set samples that are correctly classified by the model•Note: Test set is independent of training set, otherwise over-fitting will occur•2. Model usage: use the model to classify future or unknown objects2024/9/222编辑pptClassification Process (1): Model ConstructionTrainingDataClassificationAlgorithmsIF rank = ‘professor’OR years > 6THEN tenured = ‘yes’ Classifier(Model)2024/9/223编辑pptClassification Process (2): Use the Model in PredictionClassifierTestingDataUnseen Data(Jeff, Professor, 4)Tenured?编辑ppt1 Example (1): Training DatasetAn example from Quinlan’s ID3(1986)2024/9/225编辑ppt1 Example (2): Output: A Decision Tree for “buys_computer”age?overcaststudent?credit rating?noyesfairexcellent<=30>40nonoyesyesyes30..402024/9/226编辑ppt2 Algorithm for Decision Tree Building•Basic algorithm (a greedy algorithm)–Tree is constructed in a top-down recursive divide-and-conquer manner–At start, all the training examples are at the root –Attributes are categorical (if continuous-valued, they are discretized in advance) –Examples are partitioned recursively based on selected attributes–Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)•Conditions for stopping partitioning–All samples for a given node belong to the same class–There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf–There are no samples left–Reach the pre-set accuracy2024/9/227编辑pptInformation Gain （信息增益）(ID3/C4.5)•Select the attribute with the highest information gain•Assume there are two classes, P and N–Let the set of examples S contain p elements of class P and n elements of class N–The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as2024/9/228编辑pptInformation Gain in Decision Tree Building•Assume that using attribute A， a set S will be partitioned into sets {S1, S2 , …, Sv} –If Si contains pi examples of P and ni examples of N, the entropy (熵), or the expected information needed to classify objects in all subsets Si is•The encoding information that would be gained by branching on A2024/9/229编辑pptAttribute Selection by Information Gain ComputationgClass P: buys_computer = “yes”gClass N: buys_computer = “no”gCompute the entropy for age:HenceSimilarly2024/9/2210编辑ppt3. Decision Tree Rules •Automate rule creation•Rules simplification and elimination•A default rule is chosen2024/9/2211编辑ppt3.1 Extracting Classification Rules from Trees•Represent the knowledge in the form of IF-THEN rules•One rule is created for each path from the root to a leaf•Rules are easier for humans to understand•ExampleIF age = “<=30” AND student = “no” THEN buys_computer = “no”IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”IF age = “31…40” THEN buys_computer = “yes”IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”2024/9/2212编辑pptIF Age <=43 & Sex = Male & Credit Card Insurance = NoTHEN Life Insurance Promotion = No(accuracy = 75%, Figure 3.4)A Simplified Rule Obtained by Removing Attribute AgeIF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No(accuracy = 83.3% (5/6), Figure 3.5)3.2 Rules simplification and elimination2024/9/2213编辑pptFigure 3.5 A two-node decision tree for the credit card database编辑ppt2024/9/2215编辑ppt4. Further discussion•Attributes with more values accuracy / splits GainRatio(A) = Gain(A) / SplitInfo(A)•Numerical attributes binary split•Stopping condition•More than 2 values•Other Methods for building decision trees•ID3•C4.5 •CART• CHAID2024/9/2216编辑ppt5. General consideration:Advantages of Decision Trees• Easy to understand.• Map nicely to a set of production rules.• Applied to real problems.• Make no prior assumptions about the data.• Able to process both numerical and categorical data.2024/9/2217编辑pptDisadvantages of Decision Trees• Output attribute must be categorical. • Limited to one output attribute.• Decision tree algorithms are unstable.• Trees created from numeric datasets can be complex.2024/9/2218编辑pptDecision Tree Attribute SelectionAppendix C2024/9/2219编辑pptComputing Gain Ratio编辑pptComputing Gain(A)编辑pptComputing Info(I)编辑pptComputing Info(I,A)编辑pptComputing Split Info(A)编辑ppt编辑ppt编辑ppt。

点击阅读更多内容

相关文档

班主任工作计划 3篇.docx 员工补助金给付办法.doc.docx 招聘管理制度与sop流程.docx 人资管理-HR招聘流程图及人员招录管理程序.doc.docx 公司人才招聘与入职管理制度.docx 离职补偿协议书范本.doc.docx 人资管理-招聘制度.docx 人资管理-招聘面试流程.docx 招聘面试审批流程管理办法.docx HR面试内容构成表企业招聘面试管理.doc 招聘管理-员工甄选职责与权限流程图.doc.docx 厨艺大赛新闻稿范文(精选3篇).doc 大班教师工作计划汇编3篇.doc 大客户经理个人工作计划模板五篇.doc 大学歌唱比赛新闻稿汇编五篇.doc 初三教师工作计划范文汇总五篇.doc 大堂经理个人总结四篇.doc 文化中心项目阶段进度计划（18.5.1-6.30）.xls 小学生浙江文化春节作文10篇.doc 小学生江苏文化春节作文10篇.docx

猜您喜欢

第六章第二课时.ppt U8采购管理暂估业务详解2.ppt 10松鼠和松果s.ppt 湘美版四年级美术下册课件静物一家.ppt 第三章学前儿童身体保健教育分析课件.ppt 函数的单调性.ppt 《植物学裸子植物》PPT课件.ppt PCI围手术期抗凝策略探讨.ppt 爱岗敬业精神培训教育感悟课件.ppt 上生物中考重点复习课件.ppt 二年级科学下册磁铁1磁铁能吸引什么课件教科版教科版小学二年级下册自然科学课件.ppt 钾代谢PPT精品文档.ppt 南国奥林匹克花园推广传播.ppt 最新八年级物理下册第八章电功率课件人教新课标版课件.ppt 谁打电话时间长wujumin2.ppt 义务教育课程标准实验教科书小学数学.ppt 21醉翁亭记[精选文档].ppt 公开课《驴小弟变石头》绘本.ppt 12三腔二囊管使用.ppt 第13章第1节分子热运动ppt课件.ppt

进入店铺

收藏店铺

相似文档更多>

正为您匹配相似的精品文档