专题论坛大数据课件.ppt
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《专题论坛大数据课件.ppt》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 专题 论坛 数据 课件
- 资源描述:
-
1、Big Data vs Smart Model:Beauty and the BeastProf.Yike GuoDepartment of ComputingImperial College LondonModel:Mathematical Representation of a SimplifiedPhysical WorldModelling is an essential and inseparable part of all scientific activity.A scientific model seeks to representempirical objects,pheno
2、mena,and physical processes in a logical and objective wayTo understand the world or an object (called a target T),a model M is a simplified mathematicalrepresentation of it.Model is the result of abstraction from observations made,and its used to givepredictionHuman/SensorHuman/MachineHuman/Machine
3、.No Model Is Perfect:Inherent Uncertainty:These targets consist of a set of continuous phenomena(inboth time and space),and they typically produce rich signals.Because of thecontinuity in both time and space of target,the signals are in principle infinite.Butobservations(e.g.sensor readings)are made
4、 at discrete points in time and space,sothey are incomprehensive,and approximate,which brings the“uncertainty”.Overfitting or Underfitting:When learning a model from observations,such aslearning a nonlinear regression model,we need to choose the parameters such as K.Considering the fact that the inf
5、ormation from observations is partial.It is hard tomake a perfect choice of K.Such imperfectness causes the problem of model error,like underfitting(small k)and overfitting(large k).Simplification:From observations,we project from a multi-dimensional world asimplified model with significant reduced
6、dimensionality to focus on the features orproperties we are interested in.Nonlinearregression:K-order polynomialGeorge Box(statistician)“All models are wrong,but some areuseful.”Only models,from cosmological equations to theories of humanbehavior,seemed to be able to consistently,if imperfectly,expl
7、ain the worldaround us.-1980Peter Norvig(Google):All models are wrong,and increasinglyyou can succeed without them.-2008Chris Anderson(Wired):There is now a better way.Petabytesallow us to say:Correlation is enough.We can stop looking for models.We can analyze the data without hypotheses about what
8、it might show.Wecan throw the numbers into the biggest computing clusters the world hasever seen and let statistical algorithms find patterns where science cannot.(The Data Deluge Makes the Scientific Method Obsolete)-20124So,Why Model?The Google ArgumentAt the petabyte scale,information is not a ma
9、tter of simple three-and four-dimensionaltaxonomy and order but of dimensionally agnostic statistics.It calls for an entirely differentapproach,one that requires us to lose the tether of data as something that can be visualizedin its totality.It forces us to view data mathematically first and establ
10、ish a context for it later.For instance,Google conquered the advertising world with nothing more than appliedmathematics.It didnt pretend to know anything about the culture and conventions ofadvertising it just assumed that better data,with better analytical tools,would win the day.And Google was ri
11、ght.Googles founding philosophy is that we dont know why this page is better than thatone:If the statistics of incoming links say it is,thats good enough.No semantic orcausal analysis is required.Thats why Google can translate languages without actuallyknowing them(given equal corpus data,Google can
12、 translate Klingon into Farsi aseasily as it can translate French into German).And why it can match ads to contentwithout any knowledge or assumptions about the ads or the content.Model Free Sensor Informatics:Query Driventime10am10am.10amid12.7temp202129DatabaseTable raw-dataSensorNetwork3.Write ou
13、tput to a file/back to the database4.Write data processing tools toprocess/aggregate the output(maybe usingUser1.Extract all readings into a file2.Run MATLAB/R/other data processing toolsDB)5.Decide new data to acquireRepeatModel-free sensing treats the sensory system as a database,and sensing as qu
14、erying to fetch data from physicalworld.One of the leading vendors Crossbow is bundling a query processor with their devices.Wikisensing:A Model Free Sensor Informatics SystemBased on Big Data ArchitectureModel Free Sensing is Super Inefficient Data misrepresentation without model Latent information
15、 missing without model High demand of computation/storage without model Require too much of interoperability between sensorsand analyticsBayesian:Data Is Not the Enemy of Models,Rather aGreat Supporter!Bayesian probability is a formalism that allows us to reason about beliefs of models underconditio
16、ns of uncertainty based on the observations(data).If we have observed that a particular event has happened,such as Britain coming 10th in themedal table at the 2004 Olympics,then there is no uncertainty about it.However,suppose a is the statement“Britain sweeps the boards at 2012 London Olympics,win
17、ning more than 30 Gold Medals!“made before 28th of JulySince this is a statement about a future event,nobody can state with any certainty whether ornot it is true.Different people may have different beliefs in the statement depending on theirspecific knowledge of factors that might effect its likeli
18、hoodThe beliefs of the model were changing daily based on the performance data available eachday.By the 10 of August,most of peoples belief to this model should be almost 80%Thus,in general,a persons subjective belief in a statement a will depend on some body ofknowledge K.We write this as P(a|K).He
19、nrys belief in a is different from Marcels because theyare using different Ks.However,even if they were using the same K they might still havedifferent beliefs in a.The expression P(a|K)thus represents a belief measure.Sometimes,for simplicity,when Kremains constant we just write P(a),but you must b
20、e aware that this is a simplification.Model and Data Interaction:Bayesian Inference10Bayes Rule:Interaction between data and modelLearning as A Sequence of Interactionsp(Y|)p()p(Y)P(|Y)Big Data Meets Smart Models:A Bayesian Approachtowards Sensor InformaticsWe need model:a model is the representatio
21、n of our knowledge so farData:the observations which may revise our belief to the models we haveAnalysis:assessing our belief and updating our models to make them more believableSensing:acquiring needed data to update(enrich)modelsModels are learned from data(observations)by scientists (theoretical
22、abstraction)or by machine (machinelearning)Models are hypothesis (when making new observation)Models are knowledge(when established belief)Sensor Informatics:Sensing management-Managing the“neediness”:when and where to senseSensing analytics-Managing model updating:how to enrich models with observat
23、ionsReasoning-Decision making based on integration of trusted modelsP(M|D)=P(D|M)P(M)/P(D)Surprising Event:When an Observation Does not Fit aKnown ModelPosterior and prior(P(M|D)P(M)has great variance-surprise!How great is great variance?Surprise threshold Kullback-Leibler divergence:Other methods:s
24、ignficant level,Chebyshevs Theorem,From model,we get C(A,B)(e.g.a multivariateGaussian distribution)A:100mmB:50mmModel consistentA:100mmB:500mmSurprise!Camera example:Image-Analog Signal -Digital Data-Compressed Data -InformationWhy sensing so much data and then throw themaway?Why not sensing inform
25、ation directly?Using Compressive Sensing Technology to OptimizeObservationsCompressive sensing:Take the advantage of sparseness,to solve the under-determinedsignals with just a small amount of measurement.Unobserved behavior(behavior not captured by the currentmodel)is typically sparse.Reconstructio
展开阅读全文