由知识挖掘提升商务智能应用(谢邦昌)课件.ppt
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《由知识挖掘提升商务智能应用(谢邦昌)课件.ppt》由用户(三亚风情)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 知识 挖掘 提升 商务 智能 应用 谢邦昌 课件
- 资源描述:
-
1、由知识挖掘提升商务智能应用由知识挖掘提升商务智能应用-统计分析的进阶加值应用统计分析的进阶加值应用From Knowledge Mining to Business Intelligence-Advanced From Knowledge Mining to Business Intelligence-Advanced Statistics ApplicationStatistics Application 谢邦昌谢邦昌 博士博士厦门大学讲座教授兼博导厦门大学讲座教授兼博导 首都经贸大学讲座教授兼博导首都经贸大学讲座教授兼博导中央财经大学讲座教授兼博导中央财经大学讲座教授兼博导 西南西南财经大
2、学讲座教授财经大学讲座教授中国人民大学兼职教授中国人民大学兼职教授辅仁大学统计资讯学系及应用统计所教授辅仁大学统计资讯学系及应用统计所教授中华资料采矿协会理事长中华资料采矿协会理事长Outline知识采矿(整合数据采矿与文本采矿)与商业智慧的发展知识采矿程序、步骤、产出与应用如何进行数据采矿与文本采矿整合知识采矿之技术发展评论知识保存价值减少减少循环时间反应时间重复投资作业花费会议时间外界顾问等等增加增加生产力与质量企业知识的转换快且有效的决策课程创新群策群力 等等 知识资产的投资 精简与退休 人员轮替 能力 重复能量消耗 过多的会议 沟通问题 组织目标 可行性 快速 非正规为何知识如此迫切?
3、“The chief economic priority for developed countries is to raise the productivity of knowledge.The country that does this first will dominate the twenty-first century economically.”开发中国家首要经济目标为知识的创造力谁先掌握谁就统领二十一世纪的经济Peter F.Drucker资料知识形成流程DataWarehouseKnowledgeSelection/cleansingPreprocessingTarget D
4、ataPreprocessed DataPatternTransformedData Data MiningTransformationInterpretation/EvaluationIntegrationRawDataUnderstandingBI结构Monitor&IntegratorComplete DataWarehouseExtractTransformLoadRefreshmetadataOLAPServer1.Comprehensive Performance Management2.Analysis3.Query4.Reports5.Data miningData Sourc
5、esToolsServeData MartsOperationalDBsOther sourcesBusiness Intelligence资料采矿/探勘Categorize your customers or clientsClassificationForecast future sales or usagePredictionGroup similar customers or clientsSegmentationDiscover products that are purchased togetherAssociationFind patterns and trends over t
6、imeSequenceGaining market intelligence from news feedsSreekumar Sukumaran and Ashish SurekaIntegrated BI SystemsComplete DataWarehouseETLStructural DataDBMSFile SystemXMLEALegacyUnstructured DataCMSScannedDocumentsEmailETLText taggor&AnnotatorIntermedia DataRDBMSXMLSreekumar Sukumaran and Ashish Sur
7、eka知识来源与价值“On average,professional users spend 11 hours per week looking for information.Seventy-one percent said they could not find what they were looking for.Information Management Software Lazard Freres&Co.LLC February 2001The volume of digitized information will double every year from 2000 to 2
8、005(an increase to 30 times todays volume).Knowledge Management vs.Information Management Gartner Group September 2000网络讯息新闻报导专利电子邮件文件文献问题出版统计8TB(书籍),25TB(新闻),20TB(杂志),2TB(期刊)平均每分钟科学知识增加2000页新材料的阅读须时5年(24hrs/day)How Can I Keep Up With the Literature?Evolution“To study history one must know in advanc
9、e that one is attempting something fundamentally impossible,yet necessary and highly important.”Father Jacobus(Hesses Magister Ludi)Das Glasperlenspiel(The Glass Bead Game)文件知识发掘与管理技术检索检索文件 过滤过滤分类分类摘要摘要 分群分群自然语言内文分析萃取萃取探勘探勘可视化可视化萃取应用萃取应用探勘应用探勘应用信息存取知识认知信息结构知识产生Raw textTermsimilarityDocsimilarityVect
10、or centroid分群分群 d分类分类META-DATA/ANNOTATION d d d d d d d d d d d d d d t t t t t t t t t t t tStemming&Stop wordsTokenized textTerm Weightingw11w12w1nw21w22w2n wm1wm2wmn t1t2 tn d1 d2 dmSentenceselection摘要摘要Text ETL to MiningCall Taker:JamesDate:Aug.30,2002Duration:10 min.CustomerID:ADC00123Q:cust sy
11、s hasstopped working.A:checked custbios anditneed updated.Unstructured DataStructured DataCall Taker JamesDate 2002/08/30Duration 10 min.CustomerID ADC00123NounCustomerSoftwareBIOSSubj.Verb customer system.stopSW.Problem BIOS.needOriginal DataMeta DataLinguisticAnalysisTaggingDependency AnalysisName
12、d Entity ExtractionIntention AnalysisCategoryDictionarySynonymDictionaryCategoryItemVisualization&Interactive MiningMiningIBM TAKMI(Nasukawa,Nagano,1999)Mining target:individual textMining unit:texts category labeled items extracted from text using NLPText is Tough其系一个极不容易表达的抽象性概念其系一个极不容易表达的抽象性概念(AI
13、-Complete)是许多概念彼此间抽象而复杂的无尽关系组合是许多概念彼此间抽象而复杂的无尽关系组合一种名词可以代表很多不同的概念一种名词可以代表很多不同的概念CELL,IV类似的概念也有很多种方式可以表达类似的概念也有很多种方式可以表达(aliases)space ship,flying saucer,UFO,figment of imagination概念是很难加以可视化的概念是很难加以可视化的高维度高维度 其分析构面可能高达成百上千Text Mining is Easy重复性很高重复性很高只要一些简单的算法,就可以从一些极为粗糙的工只要一些简单的算法,就可以从一些极为粗糙的工作中,得到不
14、错的结果作中,得到不错的结果找出重要词组找出重要词组找到有意义的相关字找到有意义的相关字从文章中建立摘要从文章中建立摘要主要问题主要问题:结果评估结果评估必须定义目标及目的必须定义目标及目的Traditional IR-based Extractiondocvector 1profile vector docvector nscoringscorejudgments rejected docs accepted docs noyesvectorlearningthresholdlearningutility functionOntologyVector initializationThresh
15、old initializationReuse retrieval algorithmsNew threshold algorithmsScore?threshold Text-DBLexiconsLuhns ideasIt is here proposed that the frequency of word occurrence in an article furnishes a useful measurement of word significance.It is further proposed that the relative position within a sentenc
16、e of words having given values of significance furnish a useful measurement for determining the significance of sentences.The significance factor of a sentence will therefore be based on a combination of these two measurements.信息萃取-Job2 JobTitle:Ice Cream Guru Employer: JobCategory:Travel/Hospitalit
17、y JobFunction:Food Services JobLocation:Upper MidwestContact Phone:800-488-2611 DateExtracted:January 8,2001 Source: OtherCompanyJobs:-Job1Information ExtractionGiven:Source of textual documentsWell defined limited query(text based)Find:Sentences with relevant informationExtract the relevant informa
18、tion and ignore non-relevant information(important!)Link related information and output in a predetermined formatAdvisoryProgrammer-Oracle(Austin,TX)Response Code:1008-0074-97-iexc-jcn Responsibilities:This is an exciting opportunity withSiemens Wireless Terminals;a start-up venture fully capitalize
19、d by a Global Leader in Advanced Technologies.Qualified candidates will:Responsible for assisting with requirements definition,analysis,design and implementation that meet objectives,codes difficult and sophisticated routines.Develops project plans,schedules and cost data.Develop test plans and impl
20、ement physical design of databases.Develop shell scripts for administrative and background tasks,stored procedures and triggers.Using Oracles Designer 2000,assist with Data Model maintenance and assist with applications development using Oracle Forms.Qualifications:BSCS,BSMIS or closely related fiel
21、d or related equivalent knowledge normally obtained through technical education programs.5-8 years of professional experience in development,system design analysis,programming,installation using Oracle developmentAutomatic Pattern-Learning SystemsPros:Portable across domainsTend to have broad covera
22、geRobust in the face of degraded input.Automatically find appropriate statistical patternsSystem knowledge not needed by those who supply the domain knowledge.Cons:Annotated training data,and lots of it,is needed.Isnt necessarily better or cheaper than hand-built solnExamples:Riloff et al.,AutoSlog,
23、Soderland WHISK(UMass);Mooney et al.Rapier(UTexas);Ciravegna(Sheffield)Learn lexicon-syntactic patterns from templatesTrainerDecoderModelLanguageInputAnswersAnswersLanguageInputText Analysis SpectrumEntity ExtractionTargeted Factsand EventsClassificationClusteringConceptIdentificationWhat is thisdoc
24、umentabout?Who didwhat towhom whenwhere,etc.Why is getting dimensional data so hard?Hank bought plastic explosives from Henry inTucson yesterday.Named Entity ExtractionPeople,Weapons,Vehicles,DatesNEREngineHankHenryPlastic explosivesTucson11/01/07FrameNetName Extraction via MMsTextSpeechRecognitionE
25、xtractorSpeechEntities NEModelsLocationsPersonsOrganizationsThe delegation,which The delegation,which included the included the commander of the U.N.commander of the U.N.troops in Bosnia,Lt.troops in Bosnia,Lt.Gen.Sir Michael Rose,Gen.Sir Michael Rose,went to the Serb went to the Serb stronghold of
展开阅读全文