数据挖掘课件:chap4-basic-classification.ppt
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《数据挖掘课件:chap4-basic-classification.ppt》由用户(罗嗣辉)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 挖掘 课件 chap4_basic_classification
- 资源描述:
-
1、Data Mining Classification: Basic Concepts, Decision Trees, and Model EvaluationLecture Notes for Chapter 4Introduction to Data MiningbyTan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Classification: Defi
2、nitionlGiven a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.lFind a model for class attribute as a function of the values of other attributes.lGoal: previously unseen records should be assigned a class as accurately as possible. A
3、 test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Illustrating Classification Task Tan,
4、Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Examples of Classification TasklPredicting tumor cells as benign or malignantlClassifying credit card transactions as legitimate or fraudulentlClassifying secondary structures of protein as alpha-helix, beta-sheet, or random coillCategorizing
5、news stories as finance, weather, entertainment, sports, etc Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Classification TechniqueslDecision Tree based MethodslRule-based MethodslMemory based reasoninglNeural NetworkslNave Bayes and Bayesian Belief NetworkslSupport Vector Machines Ta
6、n,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Example of a Decision TreeTidRefundMaritalStatusTaxableIncomeCheat1YesSingle125KNo2NoMarried100KNo3NoSingle70KNo4YesMarried120KNo5NoDivorced95KYes6NoMarried60KNo7YesDivorced220KNo8NoSingle85KYes9NoMarried75KNo10NoSingle90KYes10categoricalcat
7、egoricalcontinuousclassRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced 80KSplitting AttributesTraining DataModel: Decision Tree Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Another Example of Decision TreeTidRefundMaritalStatusTaxableIncomeCheat1YesSingle125KNo2NoMarried100KN
8、o3NoSingle70KNo4YesMarried120KNo5NoDivorced95KYes6NoMarried60KNo7YesDivorced220KNo8NoSingle85KYes9NoMarried75KNo10NoSingle90KYes10categoricalcategoricalcontinuousclassMarStRefundTaxIncYESNONONOYesNoMarried Single, Divorced 80KThere could be more than one tree that fits the same data! Tan,Steinbach,
9、Kumar Introduction to Data Mining 4/18/2004 8 Decision Tree Classification TaskDecision Tree Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Apply Model to Test DataRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced 80KRefund Marital Status Taxable Income Cheat No Married 80K ? 10
10、Test DataStart from the root of tree. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Apply Model to Test DataRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced 80KRefund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Tan,Steinbach, Kumar Introduction to Data Mi
11、ning 4/18/2004 11 Apply Model to Test DataRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced 80KRefund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Apply Model to Test DataRefundMarStTaxIncYESNONONOYesNoMarried S
12、ingle, Divorced 80KRefund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Apply Model to Test DataRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced 80KRefund Marital Status Taxable Income Cheat No Married 80K ? 10
13、Test Data Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Apply Model to Test DataRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced 80KRefund Marital Status Taxable Income Cheat No Married 80K ? 10 Test DataAssign Cheat to “No” Tan,Steinbach, Kumar Introduction to Data Mining 4/1
14、8/2004 15 Decision Tree Classification TaskDecision Tree Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 Decision Tree InductionlMany Algorithms: Hunts Algorithm (one of the earliest) CART ID3, C4.5 SLIQ,SPRINT Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 General Struc
15、ture of Hunts AlgorithmlLet Dt be the set of training records that reach a node tlGeneral Procedure: If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt If Dt is an empty set, then t is a leaf node labeled by the default class, yd If Dt contains records that bel
16、ong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.Dt? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Hunts AlgorithmDont CheatRefundDont CheatDont CheatYesNoRefundDont CheatYesNoMaritalStatusDont Ch
17、eatCheatSingle,DivorcedMarriedTaxableIncomeDont Cheat= 80KRefundDont CheatYesNoMaritalStatusDont CheatCheatSingle,DivorcedMarriedTidRefundMaritalStatusTaxableIncomeCheat1YesSingle125KNo2NoMarried100KNo3NoSingle70KNo4YesMarried120KNo5NoDivorced95KYes6NoMarried60KNo7YesDivorced220KNo8NoSingle85KYes9No
18、Married75KNo10NoSingle90KYes10 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 Tree InductionlGreedy strategy. Split the records based on an attribute test that optimizes certain criterion.lIssues Determine how to split the recordsuHow to specify the attribute test condition?uHow to de
19、termine the best split? Determine when to stop splitting Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 Tree InductionlGreedy strategy. Split the records based on an attribute test that optimizes certain criterion.lIssues Determine how to split the recordsuHow to specify the attribute
20、 test condition?uHow to determine the best split? Determine when to stop splitting Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21 How to Specify Test Condition?lDepends on attribute types Nominal Ordinal ContinuouslDepends on number of ways to split 2-way split Multi-way split Tan,Ste
21、inbach, Kumar Introduction to Data Mining 4/18/2004 22 Splitting Based on Nominal AttributeslMulti-way split: Use as many partitions as distinct values. lBinary split: Divides values into two subsets. Need to find optimal partitioning.CarTypeFamilySportsLuxuryCarTypeFamily, LuxurySportsCarTypeSports
22、, LuxuryFamilyOR Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23 lMulti-way split: Use as many partitions as distinct values. lBinary split: Divides values into two subsets. Need to find optimal partitioning.lWhat about this split?Splitting Based on Ordinal AttributesSizeSmallMediumLar
23、geSizeMedium, LargeSmallSizeSmall, MediumLargeORSizeSmall, LargeMedium Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24 Splitting Based on Continuous AttributeslDifferent ways of handling Discretization to form an ordinal categorical attributeu Static discretize once at the beginningu D
24、ynamic ranges can be found by equal interval bucketing, equal frequency bucketing(percentiles), or clustering. Binary Decision: (A v) or (A v)u consider all possible splits and finds the best cutu can be more compute intensive Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25 Splitting B
25、ased on Continuous Attributes Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 26 Tree InductionlGreedy strategy. Split the records based on an attribute test that optimizes certain criterion.lIssues Determine how to split the recordsuHow to specify the attribute test condition?uHow to det
展开阅读全文