数据挖掘概论.课件.ppt
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《数据挖掘概论.课件.ppt》由用户(三亚风情)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 挖掘 概论 课件
- 资源描述:
-
1、 Research&Development about Data Mining 2022年8月12日星期五 1 What is Data Mining?数据挖掘概论数据挖掘概论南京航空航天大学南京航空航天大学信息科学与技术学院信息科学与技术学院皮德常皮德常 教授、博导教授、博导 Research&Development about Data Mining 2022年8月12日星期五 2 lLots of data is being collected and warehoused Web data,e-commerce purchases at department/grocery store
2、s Bank/Credit Card transactionslComputers have become cheaper and more powerfullCompetitive pressure is strong Provide better,customized services for an edge(e.g.in Customer Relationship Management)Why Mine Data?Commercial ViewpointWhy Mine Data?Scientific ViewpointlData collected and stored at enor
3、mous speeds(GB/hour)remote sensors on a satellite telescopes scanning the skies microarrays generating gene expression data scientific simulations generating terabytes of datalTraditional techniques infeasible for raw datalData mining may help scientists in classifying and segmenting data,Research&D
4、evelopment about Data Mining 2022年8月12日星期五 4 Mining Large Data Sets-Motivationldata rich but information poor!-we are drowning in data,but starving for knowledge!哇!这么多的数据!哇!这么多的数据!怎样才能用呢怎样才能用呢?挖!挖!“Necessity is the mother of invention”Data miningAutomated analysis of massive data sets Research&Devel
5、opment about Data Mining 2022年8月12日星期五 5 Mining Large Data Sets-MotivationlA famous story:跟尿布一起购买最多的商品是啤酒!跟尿布一起购买最多的商品是啤酒!diapersbeer Research&Development about Data Mining 2022年8月12日星期五 6 The success of GoogleSearch Engine:Analyzing data on the internet to find what meets your demand.Larry Page 197
6、3.3.26&Sergey Brin 1973.8.21 166亿美元亿美元&141亿美元的财产,共享一架波音亿美元的财产,共享一架波音767 Research&Development about Data Mining 2022年8月12日星期五 7 What is Data Mining?lData mining is the non-trivial process of identifying valid,novel,potentially useful,and ultimately understandable patterns from huge volume of data.U.F
7、ayyad,et al.s definition of KDD at KDD96 Research&Development about Data Mining 2022年8月12日星期五 8 What is(not)Data Mining?l What is Data Mining?Certain names are more prevalent in certain US locations(OBrien,ORurke,OReilly in Boston area)l What is not Data Mining?Look up phone number in phone director
8、y Research&Development about Data Mining 2022年8月12日星期五 9 lDraws ideas from machine learning/AI,pattern recognition,statistics,and database systemslTraditional Techniquesmay be unsuitable due to Enormity of data High dimensionality of data Heterogeneous,distributed nature of dataOrigins of Data Minin
9、gMachine Learning/Pattern RecognitionStatistics/AIData MiningDatabase systems Research&Development about Data Mining 2022年8月12日星期五 10 Architecture:Typical Data Mining Systemdata cleaning,integration,and selectionDatabase or Data Warehouse ServerData Mining EnginePattern EvaluationGraphical User Inte
10、rfaceKnowle-dgeBaseDBDWWWWOther InfoRepositories Research&Development about Data Mining 2022年8月12日星期五 11 Data Mining TaskslPrediction Use some variables to predict unknown or future values of other variables.lDescription Find human-interpretable patterns that describe the data.From Fayyad,et.al.Adva
11、nces in Knowledge Discovery and Data Mining,1996 Research&Development about Data Mining 2022年8月12日星期五 12 Data Mining Tasks.lClassificationlClusteringlAssociation Rule DiscoverylSequential Pattern DiscoverylRegressionlDeviation Detection Research&Development about Data Mining 2022年8月12日星期五 13 Classif
12、ication ExampleTidRefundMaritalStatusTaxableIncomeCheat1YesSingle125KNo2NoMarried100KNo3NoSingle70KNo4YesMarried120KNo5NoDivorced95KYes6NoMarried60KNo7YesDivorced220KNo8NoSingle85KYes9NoMarried75KNo10NoSingle90KYes10categoricalcategoricalcontinuousclassRefundMaritalStatusTaxableIncomeCheatNoSingle75
13、K?YesMarried50K?NoMarried150K?YesDivorced90K?NoSingle40K?NoMarried80K?10TestSetTraining SetModelLearn Classifier Research&Development about Data Mining 2022年8月12日星期五 14 Classification:ApplicationlDirect Marketing Goal:Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-ph
14、one product.Approach:uUse the data for a similar product introduced before.uWe know which customers decided to buy and which decided otherwise.This buy,dont buy decision forms the class attribute.uCollect some related information about the customers.Type of business,where they stay,how much they ear
展开阅读全文