书签 分享 收藏 举报 版权申诉 / 82
上传文档赚钱

类型由知识挖掘提升商务智能应用(谢邦昌)课件.ppt

  • 上传人(卖家):三亚风情
  • 文档编号:3214433
  • 上传时间:2022-08-06
  • 格式:PPT
  • 页数:82
  • 大小:7.86MB
  • 【下载声明】
    1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
    2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
    3. 本页资料《由知识挖掘提升商务智能应用(谢邦昌)课件.ppt》由用户(三亚风情)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
    4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
    5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
    配套讲稿:

    如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。

    特殊限制:

    部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。

    关 键  词:
    知识 挖掘 提升 商务 智能 应用 谢邦昌 课件
    资源描述:

    1、由知识挖掘提升商务智能应用由知识挖掘提升商务智能应用-统计分析的进阶加值应用统计分析的进阶加值应用From Knowledge Mining to Business Intelligence-Advanced From Knowledge Mining to Business Intelligence-Advanced Statistics ApplicationStatistics Application 谢邦昌谢邦昌 博士博士厦门大学讲座教授兼博导厦门大学讲座教授兼博导 首都经贸大学讲座教授兼博导首都经贸大学讲座教授兼博导中央财经大学讲座教授兼博导中央财经大学讲座教授兼博导 西南西南财经大

    2、学讲座教授财经大学讲座教授中国人民大学兼职教授中国人民大学兼职教授辅仁大学统计资讯学系及应用统计所教授辅仁大学统计资讯学系及应用统计所教授中华资料采矿协会理事长中华资料采矿协会理事长Outline知识采矿(整合数据采矿与文本采矿)与商业智慧的发展知识采矿程序、步骤、产出与应用如何进行数据采矿与文本采矿整合知识采矿之技术发展评论知识保存价值减少减少循环时间反应时间重复投资作业花费会议时间外界顾问等等增加增加生产力与质量企业知识的转换快且有效的决策课程创新群策群力 等等 知识资产的投资 精简与退休 人员轮替 能力 重复能量消耗 过多的会议 沟通问题 组织目标 可行性 快速 非正规为何知识如此迫切?

    3、“The chief economic priority for developed countries is to raise the productivity of knowledge.The country that does this first will dominate the twenty-first century economically.”开发中国家首要经济目标为知识的创造力谁先掌握谁就统领二十一世纪的经济Peter F.Drucker资料知识形成流程DataWarehouseKnowledgeSelection/cleansingPreprocessingTarget D

    4、ataPreprocessed DataPatternTransformedData Data MiningTransformationInterpretation/EvaluationIntegrationRawDataUnderstandingBI结构Monitor&IntegratorComplete DataWarehouseExtractTransformLoadRefreshmetadataOLAPServer1.Comprehensive Performance Management2.Analysis3.Query4.Reports5.Data miningData Sourc

    5、esToolsServeData MartsOperationalDBsOther sourcesBusiness Intelligence资料采矿/探勘Categorize your customers or clientsClassificationForecast future sales or usagePredictionGroup similar customers or clientsSegmentationDiscover products that are purchased togetherAssociationFind patterns and trends over t

    6、imeSequenceGaining market intelligence from news feedsSreekumar Sukumaran and Ashish SurekaIntegrated BI SystemsComplete DataWarehouseETLStructural DataDBMSFile SystemXMLEALegacyUnstructured DataCMSScannedDocumentsEmailETLText taggor&AnnotatorIntermedia DataRDBMSXMLSreekumar Sukumaran and Ashish Sur

    7、eka知识来源与价值“On average,professional users spend 11 hours per week looking for information.Seventy-one percent said they could not find what they were looking for.Information Management Software Lazard Freres&Co.LLC February 2001The volume of digitized information will double every year from 2000 to 2

    8、005(an increase to 30 times todays volume).Knowledge Management vs.Information Management Gartner Group September 2000网络讯息新闻报导专利电子邮件文件文献问题出版统计8TB(书籍),25TB(新闻),20TB(杂志),2TB(期刊)平均每分钟科学知识增加2000页新材料的阅读须时5年(24hrs/day)How Can I Keep Up With the Literature?Evolution“To study history one must know in advanc

    9、e that one is attempting something fundamentally impossible,yet necessary and highly important.”Father Jacobus(Hesses Magister Ludi)Das Glasperlenspiel(The Glass Bead Game)文件知识发掘与管理技术检索检索文件 过滤过滤分类分类摘要摘要 分群分群自然语言内文分析萃取萃取探勘探勘可视化可视化萃取应用萃取应用探勘应用探勘应用信息存取知识认知信息结构知识产生Raw textTermsimilarityDocsimilarityVect

    10、or centroid分群分群 d分类分类META-DATA/ANNOTATION d d d d d d d d d d d d d d t t t t t t t t t t t tStemming&Stop wordsTokenized textTerm Weightingw11w12w1nw21w22w2n wm1wm2wmn t1t2 tn d1 d2 dmSentenceselection摘要摘要Text ETL to MiningCall Taker:JamesDate:Aug.30,2002Duration:10 min.CustomerID:ADC00123Q:cust sy

    11、s hasstopped working.A:checked custbios anditneed updated.Unstructured DataStructured DataCall Taker JamesDate 2002/08/30Duration 10 min.CustomerID ADC00123NounCustomerSoftwareBIOSSubj.Verb customer system.stopSW.Problem BIOS.needOriginal DataMeta DataLinguisticAnalysisTaggingDependency AnalysisName

    12、d Entity ExtractionIntention AnalysisCategoryDictionarySynonymDictionaryCategoryItemVisualization&Interactive MiningMiningIBM TAKMI(Nasukawa,Nagano,1999)Mining target:individual textMining unit:texts category labeled items extracted from text using NLPText is Tough其系一个极不容易表达的抽象性概念其系一个极不容易表达的抽象性概念(AI

    13、-Complete)是许多概念彼此间抽象而复杂的无尽关系组合是许多概念彼此间抽象而复杂的无尽关系组合一种名词可以代表很多不同的概念一种名词可以代表很多不同的概念CELL,IV类似的概念也有很多种方式可以表达类似的概念也有很多种方式可以表达(aliases)space ship,flying saucer,UFO,figment of imagination概念是很难加以可视化的概念是很难加以可视化的高维度高维度 其分析构面可能高达成百上千Text Mining is Easy重复性很高重复性很高只要一些简单的算法,就可以从一些极为粗糙的工只要一些简单的算法,就可以从一些极为粗糙的工作中,得到不

    14、错的结果作中,得到不错的结果找出重要词组找出重要词组找到有意义的相关字找到有意义的相关字从文章中建立摘要从文章中建立摘要主要问题主要问题:结果评估结果评估必须定义目标及目的必须定义目标及目的Traditional IR-based Extractiondocvector 1profile vector docvector nscoringscorejudgments rejected docs accepted docs noyesvectorlearningthresholdlearningutility functionOntologyVector initializationThresh

    15、old initializationReuse retrieval algorithmsNew threshold algorithmsScore?threshold Text-DBLexiconsLuhns ideasIt is here proposed that the frequency of word occurrence in an article furnishes a useful measurement of word significance.It is further proposed that the relative position within a sentenc

    16、e of words having given values of significance furnish a useful measurement for determining the significance of sentences.The significance factor of a sentence will therefore be based on a combination of these two measurements.信息萃取-Job2 JobTitle:Ice Cream Guru Employer: JobCategory:Travel/Hospitalit

    17、y JobFunction:Food Services JobLocation:Upper MidwestContact Phone:800-488-2611 DateExtracted:January 8,2001 Source: OtherCompanyJobs:-Job1Information ExtractionGiven:Source of textual documentsWell defined limited query(text based)Find:Sentences with relevant informationExtract the relevant informa

    18、tion and ignore non-relevant information(important!)Link related information and output in a predetermined formatAdvisoryProgrammer-Oracle(Austin,TX)Response Code:1008-0074-97-iexc-jcn Responsibilities:This is an exciting opportunity withSiemens Wireless Terminals;a start-up venture fully capitalize

    19、d by a Global Leader in Advanced Technologies.Qualified candidates will:Responsible for assisting with requirements definition,analysis,design and implementation that meet objectives,codes difficult and sophisticated routines.Develops project plans,schedules and cost data.Develop test plans and impl

    20、ement physical design of databases.Develop shell scripts for administrative and background tasks,stored procedures and triggers.Using Oracles Designer 2000,assist with Data Model maintenance and assist with applications development using Oracle Forms.Qualifications:BSCS,BSMIS or closely related fiel

    21、d or related equivalent knowledge normally obtained through technical education programs.5-8 years of professional experience in development,system design analysis,programming,installation using Oracle developmentAutomatic Pattern-Learning SystemsPros:Portable across domainsTend to have broad covera

    22、geRobust in the face of degraded input.Automatically find appropriate statistical patternsSystem knowledge not needed by those who supply the domain knowledge.Cons:Annotated training data,and lots of it,is needed.Isnt necessarily better or cheaper than hand-built solnExamples:Riloff et al.,AutoSlog,

    23、Soderland WHISK(UMass);Mooney et al.Rapier(UTexas);Ciravegna(Sheffield)Learn lexicon-syntactic patterns from templatesTrainerDecoderModelLanguageInputAnswersAnswersLanguageInputText Analysis SpectrumEntity ExtractionTargeted Factsand EventsClassificationClusteringConceptIdentificationWhat is thisdoc

    24、umentabout?Who didwhat towhom whenwhere,etc.Why is getting dimensional data so hard?Hank bought plastic explosives from Henry inTucson yesterday.Named Entity ExtractionPeople,Weapons,Vehicles,DatesNEREngineHankHenryPlastic explosivesTucson11/01/07FrameNetName Extraction via MMsTextSpeechRecognitionE

    25、xtractorSpeechEntities NEModelsLocationsPersonsOrganizationsThe delegation,which The delegation,which included the included the commander of the U.N.commander of the U.N.troops in Bosnia,Lt.troops in Bosnia,Lt.Gen.Sir Michael Rose,Gen.Sir Michael Rose,went to the Serb went to the Serb stronghold of

    26、Pale,near stronghold of Pale,near Sarajevo,for talks with Sarajevo,for talks with Bosnian Serb leader Bosnian Serb leader Radovan Karadzic.Radovan Karadzic.TrainingProgramtrainingsentencesanswersThe delegation,which The delegation,which included the included the commander of thecommander of theU.N.U

    27、.N.troops introops inBosniaBosnia,Lt.,Lt.Gen.SirGen.SirMichael RoseMichael Rose,went to the Serb went to the Serb stronghold ofstronghold ofPalePale,nearnear Sarajevo Sarajevo,for talks,for talks with Bosnian Serb with Bosnian Serb leaderleader Radovan Radovan KaradzicKaradzic.An easy but successful

    28、 HMM application:Prior to 1997-no learning approach competitive with hand-built rule systemsSince 1997-Statistical approaches (BBN(Bikel et al.1997),NYU,MITRE,CMU/JustSystems)achieve state-of-the-art performanceNER数据库探勘作业流程documentDocumentCollectionsunbeachFrequent term set:surffunsun,beachclusterC1

    29、C2C4C5C3Clustering:C1,C2,C4,C5.Clustering Description:surf,sun,beach,fun.AnophelesFeedback as Model InterpolationConcept CD)|(DQDDocument DResultsFeedback DocsF=d1,d2,dnFQQ)1(Generative modelDivergence minimizationQF=0No feedbackFQ=1Full feedbackQQ非单调性资料(Heterogeneous)TDRTDRTDRTDRTDR成千成万的历史纪录巨量分析文件分

    30、群文件分群 1000解决方案解决方案个案库Mooter科学人杂志3月号文件数据分群Annotation and TaggingOnNovember 16,2005,IBM announced it hadacquired Collation,a privately held companybased inRedwood City,California forundisclosed amount.DateAcquiringOrganizationAcquisitionEventAcquiredOrganizationPlaceAmountText AnnotatorDateOrganizatio

    31、nPlaceAmountNov.16IBMRedwood City,CAUndisclosedOutput toRDBMSXMLoutputOn November 16,2005,IBM announced it had acquired Collation,a privately held company based in Redwood City,California for undisclosed amount.Linguistic Concept Extractionfrom Customer Service Records Bag of“Words”extractionCstmr I

    32、DCustomerYellowIncHappyNotSwitchCellPhoneExpressionsextractionCstmr IDCustomerYellow IncswitchCell PhoneNot happyNamed EntitiesextractionCustomer CRM termCstmr?Yellow Inc Telco CompanyCell Phone Telco TermNot happySwitchEvents/SentimentExtractionCustomer(cstmr)cell phone unhappy(Negative)Switch to(N

    33、egative Predicate)yellow inc(Competition)CombinedWith structured dataDecision MakingChurner Special OfferKnowledge InferenceInformation ExtractionInformation RetrievalExtracting Information From TextStructuring knowledge from texttagging,compounds,grammatical analysis,ontological interpretation,regu

    34、lar expressions,patter recognitionTextDatabaseMinimalrecursionsemanticsrepresentationsDeep Thought EU projectKnowledge ConstructionWant to extract prominent concepts/relations from texttagging,compounds,NP recognition,term frequencies,stopwords,language identificationBrasethvik&Gulla,DKE,38/1,2001Do

    35、maindoc.coll.OntologyStatistical&linguisticanalysesPatterns ConstructionTaipeiTokyoNew YorkRepositoryTagging&annotationCDWKnowledge RepositoryOr structured dataPatternsPatternsExplorerWeb BrowserHard diskWindows XPDesktop computerHard disk size 40 GBProductsLaptopcomputersOperating SystemLinuxMacint

    36、oshis acrashesInstalled from http:/.人、事、时、地、物元资料participate in人物人物性质性质Conceptual ObjectsPhysical EntitiesTemporal Entities应用应用affect or/refer torefer to/refinerefer to/identifielocationatwithin地点地点时间时间资源索引人物人物事件事件物件物件Derivedknowledgedata(e.g.RDF)ThesauriextentCRM entitiesOntologyexpansionSourcesandm

    37、etadata(XML/RDF)Backgroundknowledge/AuthoritiesCIDOCCRM orDCConcept LatticeC1:(D1,)C2:(d1,d2,d4,t1,t6)C3:(d3,d4,t4)C4:(d1,d2,t1,t3,t5,t6)C5:(d4,t1,t4,t6)C6:(d3,t2,t4)C7:(,T1)The formal conceptC4 has two own termst3,t5 and two inheritedterms t1,t6Given the context(D1,T1)whereD1=d1,d2,d3,d4&T1=t1,t2,t

    38、3,t4,t5,t6 R t1 t2 t3 t4 t5 t6d11 0 1 0 1 1 d21 0 1 0 1 1d30 1 0 1 0 0d41 0 0 1 0 1Table:The input relationR=documents keywordsHasseDiagramP14 performedP11 participated inP94 has createdE31 Document“Yalta Agreement”E7 Activity“Crimea Conference”E65 Creation Event*E38 ImageP86 falls withinP7 took pla

    39、ce atP67 is referred to byE52 Time-SpanFebruary 1945P81 ongoing throughoutP82 at some time withinE39 ActorE39 ActorE39 ActorE53 Place7012124E52 Time-Span11-2-1945Explicit Events,Object Identity,SymmetryRules ExtractionThe formal concept C4 makes it possible the following rules R1:t3 t1 t6R2:t5 t1 t6

    40、R3:t3 t5The interpretation of the R1 and R2:The use of terms t3 or t5 is always associated with that of terms t1 and t6The rule R3 express mutual equivalence of the terms t3,t5:All the documents which have the term t3 also have the t5 term.文献文献知识群组专家与决策知识呈现实时性分群Real-time IndexMetadata ofSearching Re

    41、sults公文性资料中低收入户补助因果图-失依儿童各县市福利,信托基金的成立所在各县市失依儿童状态各县市政府,社会局等介入 对单亲家庭的补助之灾后重建及经费相关使用灾后重建基金规则Clustering范例很适合用机洗香味好闻去污力强洗衣省力气味清香能去除99种污渍洗得特别干净香味好闻白袜子洗得最干净气味很香不伤手能够很好的去除污渍衣服不易褪色洗衣不费力能去除99种污渍用量少洗得干净对皮肤刺激少洗各种污渍都很干净洗得干净价格适当洗衣服的效果较好气味不错一直使用该品牌洗好的衣物更白气味好闻广告印象深洗得干净易漂清不太伤手洗得干净用量少洗得干净用量比别的牌子少广告大洗得干净用量少质量好用量少洗得干净

    42、包装好广告多,吸引人香味好闻洗的干净、白宣传好,广告有趣很多人都说好知识脉络知识地图事件追踪信息检索知识概念Kuhns Descriptive ProjectImmature ScienceNormal ScienceAnomaliesCrisisRevolutionTasks in News DetectionNews FeedsDetectionSegmentationOn-LineRetroTrackingMight be RelevantLocationAden,YemenDateOctober 12,200011:18 am(UTC+3)Attack typesuicide bomb

    43、ingDeaths19(including the 2 perpetrators)Injured39Perpetrator(s)al-Qaeda,carried out by Ibrahim al-Thawr and Abdullah al-Misawa911事件可预防FBI 明尼苏达干员Zacarias Moussaoui 个人计算机FBI凤凰城备忘录(George Will)Dr.Bhandari(Virtual Gold,Inc)资料探勘 可预防911悲剧恐怖份子911恐怖份子网络911恐怖份子网络赤军旅(RedArmy Faction)威胁Horst Herold(德国联邦警察总长德国

    44、联邦警察总长)建立数据探勘之信息网GermanysBundeskriminalamt 1972数据源房屋销售、能源公司成果Rolf Heissler(RAF 成员)结果erold遭报导违反人权退休1986修改犯罪条例911三个飞行员系来自Hamburg疫病警示及通报系统世界卫生组织多年前即建立了疫病警示及通报系统(Epidemic Alert and Response)。由于一些国家可能基于经济冲击的考虑,可能淡化有关疫情的报导,世界卫生组织的这套系统特别装置了一套软件,可以由各国媒体的网站上由各国媒体的网站上抓抓取相关资料取相关资料并由二十位专家分析这些资料中的信并由二十位专家分析这些资料中

    45、的信息息。HighWire.stanford.edu信息 与 知识 Amazon数字相机销售新闻事件华盛顿时报美国家卫生院 NIH热门研究Proposals by Funding/Date across IRGs and Activity Types疾病诊疗指引 Athena/EON-StanfordAthena临床指引R.D.Shankar,et al.2001高血压临床指引 Athena Hypertension GuidelineA.Advani,et al.2003受灾户(金融辅助政策)贷款(受灾户、临时住宅)Generative Discriminative重建家园专案金融机构贷款震

    46、灾重建暂行条例受灾户房屋利息损毁灾户objectmethodObject:attributeObject:attributeObject:attributeObject:conditionObject:attributeObject:Attribute(condition)Object:attributeSpecifyGeneralizeIntegrating Distributed Knowledge Adaptive knowledge infrastructure is in place Knowledge resources identified and shared appropri

    47、ately Timely knowledge gets to the right person to make decisions Intelligent tools for authoring through archiving Cohesive knowledge development between JPL,its partners,and customers Instrument design is semi-automatic based on knowledge repositories Mission software auto-instantiates based on un

    48、ique mission parameters KM principals are part of Lab culture and supported by layered COTS products Remote data management allows spacecraft to self-command Knowledge gathered anyplace from hand-held devices using standard formats on interplanetary Internet Expert systems on spacecraft analyze and

    49、upload data Autonomous agents operate across existing sensor and telemetry products Industry and academia supply spacecraft parts based on collaborative designs derived from JPLs knowledge systemCapturing KnowledgeSharing Knowledge MarsNet Europa Orbiter Space Interferometry MissionEnables capture o

    50、f knowledge at the point of origin,human or robotic,without invasive technologyEnables seamless integration of systems throughout the world and with robotic spacecraftEnables sharing of essential knowledge to complete Agency tasksModeling Expert Knowledge Systems model experts patterns and behaviors

    展开阅读全文
    提示  163文库所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
    关于本文
    本文标题:由知识挖掘提升商务智能应用(谢邦昌)课件.ppt
    链接地址:https://www.163wenku.com/p-3214433.html

    Copyright@ 2017-2037 Www.163WenKu.Com  网站版权所有  |  资源地图   
    IPC备案号:蜀ICP备2021032737号  | 川公网安备 51099002000191号


    侵权投诉QQ:3464097650  资料上传QQ:3464097650
       


    【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。

    163文库