多元统计分析课件.ppt
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《多元统计分析课件.ppt》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 多元 统计分析 课件
- 资源描述:
-
1、Preface to the 1st EditionMost of the observable phenomenafinmin in the empirical(empirikl经验)sciences are of a multivariate nature.In financial studies,assets in stock markets are observed simultaneously and their joint development is analyzed to better understand general tendencies(趋势)and to track
2、indices(路灯).The underlying theoretical structure of these and many other quantitative studies of applied sciences is multivariate.This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate data analysis with a strong focus on applications.The aim of the bo
3、ok is to present multivariate data analysis in a way that is understandable for non-mathematicians and practitioners who are(面对)by statistical data analysis.This is achieved by focusing on the practical relevance and through the e-book character of this text.All practical examples may be recalculate
4、d and modified by the reader using a standard web browser and without reference or application of any specific software.Most of the observable phenomenafinmin in the empirical(empirikl经验)sciences are of a multivariate nature.The underlying theoretical structure of these and many other quantitative s
5、tudies of applied sciences is multivariate.This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate,mlti vereit data analysis with a strong focus on applications.The book is divided into three main parts.The first part is devoted to graphical techniques
6、describing the distributions of the variables involved.The second part deals with multivariate random variables and presents from a theoretical point of view distributions,estimators and tests for various practical situations.The last part is on multivariate techniques and introduces the reader to t
7、he wide selection of tools available for multivariate data analysis.All data sets are given in the appendix and are downloadable from www.md-.The text contains a wide variety of exercises the solutions of which are given in a separate textbook.In addition a full set of transparencies on www.md- is p
8、rovided making iteasier for an instructor to present the materials in this book.All transparencies contain hyper links to the statistical web service so that students and instructors alike may recompute all examples via a standard web browser.1-2 week UNIT-I Descriptive Techniques(描述技术描述技术)1 Compari
9、son(对照)(对照)of Batches 1.1 Boxplots 4 1.2 Histograms 10 1.3 Scatterplots 17 1.4 Data Set-Boston Housing 351 Comparison of BatchesMultivariate statistical analysis is concerned with analyzing and understanding data in high dimensions.We suppose that we are given a set xini=1 of n observations of a var
10、iable vector X in Rp.That is,we suppose that each observation xi has p dimensions:xi=(xi1,xi2,.,xip),and that it is an observed value of a variable vector X Rp.Therefore,X is composed of p random variables:X=(X1,X2,.,Xp)where Xj,for j=1,.,p,is a one-dimensional random variable.1 Comparison of Batche
11、sMultivariate statistical analysis is concerned with analyzing and understanding data in high dimensions.How do we begin to analyze this kind of data?Before we investigate questions on what inferences we can reach from the data,we should think about how to look at the data.This involves descriptive
12、techniques.Questions that we could answer by descriptive techniques are:Are there components of X that are more spread out than others?Are there some elements of X that indicate subgroups of the data?Are there outliers in the components of X?How“normal”is the distribution of the data?1.1 Boxplots1 C
13、omparison of BatchesGenuinedenjuin真正的真正的X6X1The median and mean bars are measures of locations.The relative location of the median(and the mean)in the box is a measure of skewness.The length of the box and whiskers are a measure of spread.The length of the whiskers indicate the tail length of the di
14、stribution.The outlying points are indicated with a“”or“”depending on if they are outside of FUL 1.5dF or FUL 3dF respectively.The boxplots do not indicate multi modality or clusters.If we compare the relative size and location of the boxes,we are comparing distributions.SummaryReading material21.da
15、ta capacity21.data capacity数据容量数据容量kpsiti22.data handling22.data handling数据处理数据处理hndli23.data reduction23.data reduction数据缩减分析数据缩减分析ridkn24.data transformation24.data transformation数据变换数据变换25.density function25.density function密度函数密度函数26.description26.description描述描述27.descriptive27.descriptive描述性的描
16、述性的28.deviation from average28.deviation from average均值离差均值离差,di:viein背离背离29.29.DfDf.Fit.Fit拟合差值拟合差值30.df.(degree of freedom)30.df.(degree of freedom)自由度自由度31.distribution shape31.distribution shape分布形状分布形状eip32.double logarithmic32.double logarithmic双对数双对数,l:grimik33.eigenvector33.eigenvector特征向量特征
17、向量aign,vekt(r)34.error of estimate34.error of estimate估计误差估计误差estimeit35.estimation35.estimation估计量估计量estimein重音差别重音差别36.Euclidean distance36.Euclidean distance欧式距离欧式距离ju:klidin37.expected value37.expected value期望值期望值ikspektid38.experimental sampling38.experimental sampling实验抽样实验抽样ik,sperimentl s:mp
18、li39.explanatory variable39.explanatory variable说明变量说明变量iksplntrivribl40.explore Summarize40.explore Summarize探索探索摘要摘要ikspl:smraiz1.2 Histogramsh=0.4DiagonalHistograms are density(denst)(密度密度)estimates(estimeits概算概算).A density estimate gives a good impression of the distribution of the data.In contr
19、ast to boxplots,density estimates show possible multimodality(多模式;综合多模式;综合,mltimdliti)of the data.The idea is to locally represent the data density by counting the number of observations in a sequence of consecutive(连续的连续的)intervals(bins)(箱箱)with origin(rn起源起源、原点、原点)x0 .Let Bj(x0,h)denote(dinut,指示指示
20、,表示表示)the bin of length h which is the element of a bin grid starting at x0:Bj(x0,h)=x0+(j 1)h,x0+jh),j Z,where.,.)(square brackets)denotes a left closed and right open interval(ntrvl 间隔间隔,右开区间右开区间).If xin i=1 is an i.i.d.sample with density f,the histogram is defined as follows:In sum(1.7)the first
21、 indicator function I xi Bj(x0,h)counts the number of observations falling into bin Bj(x0,h).The second indicator function I is responsible for“localizing”(luklizi局限)the counts around x.The parameter h is a smoothing or localizing parameter and controls the width(wid)of the histogram bins.An h that
展开阅读全文