您所在位置：网站首页 > 高等教育 > 大学课件 > cop5992–dataminingtermprojectrandom…：cop5992–数据挖掘项目随机…

cop5992–dataminingtermprojectrandom…：cop5992–数据挖掘项目随机….ppt

13页

卖家[上传人]：tian****1990

文档编号：81414982

上传时间：2019-02-21

文档格式：PPT

文档大小：312.31KB

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

15金贝

下载

/ 13 举报版权申诉马上下载

文本预览

下载提示

常见问题

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI,RANDOM SUBSPACE METHOD (RSM),Proposed by Ho “The Random Subspace for Constructing Decision Forests”, 1998 Another combining technique for weak classifiers like Bagging, Boosting.,RSM ALGORITHM,1. Repeat for b = 1, 2, . . ., B: (a) Select an r-dimensional random subspace X from the original p-dimensional feature space X. 2. Combine classifiers Cb(x), b = 1, 2, . . ., B, by simple majority voting to a final decision rule,MOTIVATION FOR RSM,Redundancy in Data Feature Space Completely redundant feature set Redundancy is spread over many features Weak classifiers that have critical training sample sizes,RSM PERFORMANCE ISSUES,RSM Performance depends on: Training sample size The choice of a base classifier The choice of combining rule (simple majority vs. weighted) The degree of redundancy of the dataset The number of features chosen,DECISION FORESTS (by Ho),A combination of trees instead of a single tree Assumption: Dataset has some redundant features Works efficiently with any decision tree algorithm and data splitting method Ideally, look for best individual trees with lowest tree similarity,UNLABELED DATA,Small number of labeled documents Large pool of unlabeled documents How to classify unlabeled documents accurately?,EXPECTATION-MAXIMIZATION (E-M),CO-TRAINING,Blum and Mitchel, “Combining Labeled and Unlabeled Data with Co-Training”, 1998. Requirements: Two sufficiently strong feature sets Conditionally independent,CO-TRAINING,APPLICATION OF CO-TRAINING TO A SINGLE FEATURE SET,Algorithm: Obtain a small set L of labeled examples Obtain a large set U of unlabeled examples Obtain two sets F1 and F2 of features that are sufficiently redundant While U is not empty do: Learn classifier C1 from L based on F1 Learn classifier C2 from L based on F2 For each classifier Ci do: Ci labels examples from U based on Fi Ci chooses the most confidently predicted examples E from U E is removed from U and added (with their given labels) to L End loop,THINGS TO DO,How can we measure redundancy and use it efficiently? Can we improve Co-training? How can we apply RSM efficiently to: Supervised learning Semi-supervised learning Unsupervised learning,QUESTIONS,????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????,。

点击阅读更多内容

相关文档

猜您喜欢

刑事诉讼法程序性违法的法律后果案例分析.ppt 审计自查报告3篇.doc 《经济法案例张鹏》课件.ppt 脱贫攻坚任务落实方案.doc xx年春五年级数学下册长方体的表面积教案（新北师大版）.docx 脱贫攻坚档案规范化建设方案.doc 脱贫摘帽迎国检实施方案.doc 医学ppt--mediastinummodadam.ppt xx年春五年级英语期中考试试卷分析.docx 腐蚀和剧毒品安全事故救援预案.doc 脱贫攻坚低保兜底行动方案.doc 脚手架搭拆及安全防护方案.doc 脱硫塔防火施工方案.doc

进入店铺

收藏店铺

相似文档更多>

正为您匹配相似的精品文档