书签 分享 收藏 举报 版权申诉 / 30
上传文档赚钱

类型数据挖掘概论.课件.ppt

  • 上传人(卖家):三亚风情
  • 文档编号:3251717
  • 上传时间:2022-08-13
  • 格式:PPT
  • 页数:30
  • 大小:4.15MB
  • 【下载声明】
    1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
    2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
    3. 本页资料《数据挖掘概论.课件.ppt》由用户(三亚风情)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
    4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
    5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
    配套讲稿:

    如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。

    特殊限制:

    部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。

    关 键  词:
    数据 挖掘 概论 课件
    资源描述:

    1、 Research&Development about Data Mining 2022年8月12日星期五 1 What is Data Mining?数据挖掘概论数据挖掘概论南京航空航天大学南京航空航天大学信息科学与技术学院信息科学与技术学院皮德常皮德常 教授、博导教授、博导 Research&Development about Data Mining 2022年8月12日星期五 2 lLots of data is being collected and warehoused Web data,e-commerce purchases at department/grocery store

    2、s Bank/Credit Card transactionslComputers have become cheaper and more powerfullCompetitive pressure is strong Provide better,customized services for an edge(e.g.in Customer Relationship Management)Why Mine Data?Commercial ViewpointWhy Mine Data?Scientific ViewpointlData collected and stored at enor

    3、mous speeds(GB/hour)remote sensors on a satellite telescopes scanning the skies microarrays generating gene expression data scientific simulations generating terabytes of datalTraditional techniques infeasible for raw datalData mining may help scientists in classifying and segmenting data,Research&D

    4、evelopment about Data Mining 2022年8月12日星期五 4 Mining Large Data Sets-Motivationldata rich but information poor!-we are drowning in data,but starving for knowledge!哇!这么多的数据!哇!这么多的数据!怎样才能用呢怎样才能用呢?挖!挖!“Necessity is the mother of invention”Data miningAutomated analysis of massive data sets Research&Devel

    5、opment about Data Mining 2022年8月12日星期五 5 Mining Large Data Sets-MotivationlA famous story:跟尿布一起购买最多的商品是啤酒!跟尿布一起购买最多的商品是啤酒!diapersbeer Research&Development about Data Mining 2022年8月12日星期五 6 The success of GoogleSearch Engine:Analyzing data on the internet to find what meets your demand.Larry Page 197

    6、3.3.26&Sergey Brin 1973.8.21 166亿美元亿美元&141亿美元的财产,共享一架波音亿美元的财产,共享一架波音767 Research&Development about Data Mining 2022年8月12日星期五 7 What is Data Mining?lData mining is the non-trivial process of identifying valid,novel,potentially useful,and ultimately understandable patterns from huge volume of data.U.F

    7、ayyad,et al.s definition of KDD at KDD96 Research&Development about Data Mining 2022年8月12日星期五 8 What is(not)Data Mining?l What is Data Mining?Certain names are more prevalent in certain US locations(OBrien,ORurke,OReilly in Boston area)l What is not Data Mining?Look up phone number in phone director

    8、y Research&Development about Data Mining 2022年8月12日星期五 9 lDraws ideas from machine learning/AI,pattern recognition,statistics,and database systemslTraditional Techniquesmay be unsuitable due to Enormity of data High dimensionality of data Heterogeneous,distributed nature of dataOrigins of Data Minin

    9、gMachine Learning/Pattern RecognitionStatistics/AIData MiningDatabase systems Research&Development about Data Mining 2022年8月12日星期五 10 Architecture:Typical Data Mining Systemdata cleaning,integration,and selectionDatabase or Data Warehouse ServerData Mining EnginePattern EvaluationGraphical User Inte

    10、rfaceKnowle-dgeBaseDBDWWWWOther InfoRepositories Research&Development about Data Mining 2022年8月12日星期五 11 Data Mining TaskslPrediction Use some variables to predict unknown or future values of other variables.lDescription Find human-interpretable patterns that describe the data.From Fayyad,et.al.Adva

    11、nces in Knowledge Discovery and Data Mining,1996 Research&Development about Data Mining 2022年8月12日星期五 12 Data Mining Tasks.lClassificationlClusteringlAssociation Rule DiscoverylSequential Pattern DiscoverylRegressionlDeviation Detection Research&Development about Data Mining 2022年8月12日星期五 13 Classif

    12、ication ExampleTidRefundMaritalStatusTaxableIncomeCheat1YesSingle125KNo2NoMarried100KNo3NoSingle70KNo4YesMarried120KNo5NoDivorced95KYes6NoMarried60KNo7YesDivorced220KNo8NoSingle85KYes9NoMarried75KNo10NoSingle90KYes10categoricalcategoricalcontinuousclassRefundMaritalStatusTaxableIncomeCheatNoSingle75

    13、K?YesMarried50K?NoMarried150K?YesDivorced90K?NoSingle40K?NoMarried80K?10TestSetTraining SetModelLearn Classifier Research&Development about Data Mining 2022年8月12日星期五 14 Classification:ApplicationlDirect Marketing Goal:Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-ph

    14、one product.Approach:uUse the data for a similar product introduced before.uWe know which customers decided to buy and which decided otherwise.This buy,dont buy decision forms the class attribute.uCollect some related information about the customers.Type of business,where they stay,how much they ear

    15、n,etc.uUse this information as input attributes to learn a classifier model.Research&Development about Data Mining 2022年8月12日星期五 15 Clustering DefinitionlGiven a set of data points,each having a set of attributes,and a similarity measure among them,find clusters such that Data points in one cluster

    16、are more similar to one another.Data points in separate clusters are less similar to one another.Research&Development about Data Mining 2022年8月12日星期五 16 ClusteringxEuclidean Distance Based Clustering in 3-D space.Intra-cluster distancesare minimizedInter-cluster distancesare maximized Research&Devel

    17、opment about Data Mining 2022年8月12日星期五 17 Clustering:ApplicationlDocument Clustering:Goal:To find groups of documents that are similar to each other based on the important terms appearing in them.Approach:To identify frequently occurring terms in each document.Form a similarity measure based on the

    18、frequencies of different terms.Use it to cluster.Gain:Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.Research&Development about Data Mining 2022年8月12日星期五 18 Illustrating Document ClusteringlClustering Points:3204 Articles of Los Angeles

    19、Times.lSimilarity Measure:How many words are common in these documents(after some word filtering).CategoryTotalArticlesCorrectlyPlacedFinancial555364Foreign341260National27336Metro943746Sports738573Entertainment354278 Research&Development about Data Mining 2022年8月12日星期五 19 Association Rule Discovery

    20、lGiven a set of records each of which contain some number of items from a given collection;Produce dependency rules which will predict occurrence of an item based on occurrences of other items.TIDItems1Bread,Coke,Milk2Beer,Bread3Beer,Coke,Diaper,Milk4Beer,Bread,Diaper,Milk5Coke,Diaper,MilkRules Disc

    21、overed:Diaper,Milk-Beer Research&Development about Data Mining 2022年8月12日星期五 20 Association Rule Discovery:Application 1lSupermarket shelf management.Goal:To identify items that are bought together by sufficiently many customers.Approach:Process the point-of-sale data collected with barcode scanners

    22、 to find dependencies among items.A classic rule uIf a customer buys diaper and milk,then he is very likely to buy beer.lSo,dont be surprised if you find six-packs stacked next to diapers!Research&Development about Data Mining 2022年8月12日星期五 21 RegressionlPredict a value of a given continuous valued

    23、variable based on the values of other variables,assuming a linear or nonlinear model of dependency.lGreatly studied in statistics,neural network fields.lExamples:Predicting sales amounts of new product based on advetising expenditure.Predicting wind velocities as a function of temperature,humidity,a

    24、ir pressure,etc.Time series prediction of stock market indices.Research&Development about Data Mining 2022年8月12日星期五 22 Deviation/Anomaly DetectionlDetect significant deviations from normal behaviorlApplications:Credit Card Fraud Detection Network Intrusion Detection Research&Development about Data M

    25、ining 2022年8月12日星期五 23 Challenges of Data MininglScalabilitylDimensionalitylComplex and Heterogeneous DatalData QualitylData Ownership and DistributionlPrivacy PreservationlStreaming Data Research&Development about Data Mining 2022年8月12日星期五 24 My hopel数据挖掘研究已经开展了近数据挖掘研究已经开展了近15年。推进该技术的广泛应用:年。推进该技术的广

    26、泛应用:1.企业界已经开始关注数据挖掘技术企业界已经开始关注数据挖掘技术u研究部门应该做什么?研究部门应该做什么?2.自身技术的研究:自身技术的研究:u易用性易用性u可用性可用性3.与应用领域的结合:与应用领域的结合:u金融业金融业u生物信息学生物信息学u信息检索。信息检索。u飞行器故障诊断与预测、可靠性、飞行器故障诊断与预测、可靠性、Research&Development about Data Mining 2022年8月12日星期五 25 My research in recent years1.Mining Acceleration-like Association Rule2.Int

    27、erior-oriented Intrusion Detection System Based on Multi-agents 3.Fuzzy Clustering Algorithm4.A Fast Trajectory Clustering Algorithm with Sampling Research&Development about Data Mining 2022年8月12日星期五 26 My research in recent years5.An improved C-means clustering algorithm:employs the theory of gravi

    28、ty to distribute the instances.Research&Development about Data Mining 2022年8月12日星期五 27 My research in recent years6.A Neighborhood-Based Trajectory Clustering Algorithm:Our key insight is that neighborhood-based local density is quite different from the absolute global density used in TRACLUS.(a)TRA

    29、CLUSs result for Deer95 (b)NBTCs result for Deer95 Research&Development about Data Mining 2022年8月12日星期五 28 My research in recent years7.Unifying Density-Based Clustering and Outlier Detection:to discover density-based clusters and assign to each density-based outlier a degree of being an outlier.Research&Development about Data Mining 2022年8月12日星期五 29 My research in recent years8.DM的应用的应用A.信息系统:数据库与信息系统安全、软件安全;信息系统:数据库与信息系统安全、软件安全;B.航空航天:可靠性分析、故障检测与预测;航空航天:可靠性分析、故障检测与预测;C.电子元器件:故障诊断,电子元器件:故障诊断,Research&Development about Data Mining 2022年8月12日星期五 30 谢谢谢谢!Questions?

    展开阅读全文
    提示  163文库所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
    关于本文
    本文标题:数据挖掘概论.课件.ppt
    链接地址:https://www.163wenku.com/p-3251717.html

    Copyright@ 2017-2037 Www.163WenKu.Com  网站版权所有  |  资源地图   
    IPC备案号:蜀ICP备2021032737号  | 川公网安备 51099002000191号


    侵权投诉QQ:3464097650  资料上传QQ:3464097650
       


    【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。

    163文库