书签 分享 收藏 举报 版权申诉 / 38
上传文档赚钱

类型多元统计分析课件.ppt

  • 上传人(卖家):晟晟文业
  • 文档编号:4978031
  • 上传时间:2023-01-29
  • 格式:PPT
  • 页数:38
  • 大小:2.37MB
  • 【下载声明】
    1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
    2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
    3. 本页资料《多元统计分析课件.ppt》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
    4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
    5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
    配套讲稿:

    如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。

    特殊限制:

    部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。

    关 键  词:
    多元 统计分析 课件
    资源描述:

    1、Preface to the 1st EditionMost of the observable phenomenafinmin in the empirical(empirikl经验)sciences are of a multivariate nature.In financial studies,assets in stock markets are observed simultaneously and their joint development is analyzed to better understand general tendencies(趋势)and to track

    2、indices(路灯).The underlying theoretical structure of these and many other quantitative studies of applied sciences is multivariate.This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate data analysis with a strong focus on applications.The aim of the bo

    3、ok is to present multivariate data analysis in a way that is understandable for non-mathematicians and practitioners who are(面对)by statistical data analysis.This is achieved by focusing on the practical relevance and through the e-book character of this text.All practical examples may be recalculate

    4、d and modified by the reader using a standard web browser and without reference or application of any specific software.Most of the observable phenomenafinmin in the empirical(empirikl经验)sciences are of a multivariate nature.The underlying theoretical structure of these and many other quantitative s

    5、tudies of applied sciences is multivariate.This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate,mlti vereit data analysis with a strong focus on applications.The book is divided into three main parts.The first part is devoted to graphical techniques

    6、describing the distributions of the variables involved.The second part deals with multivariate random variables and presents from a theoretical point of view distributions,estimators and tests for various practical situations.The last part is on multivariate techniques and introduces the reader to t

    7、he wide selection of tools available for multivariate data analysis.All data sets are given in the appendix and are downloadable from www.md-.The text contains a wide variety of exercises the solutions of which are given in a separate textbook.In addition a full set of transparencies on www.md- is p

    8、rovided making iteasier for an instructor to present the materials in this book.All transparencies contain hyper links to the statistical web service so that students and instructors alike may recompute all examples via a standard web browser.1-2 week UNIT-I Descriptive Techniques(描述技术描述技术)1 Compari

    9、son(对照)(对照)of Batches 1.1 Boxplots 4 1.2 Histograms 10 1.3 Scatterplots 17 1.4 Data Set-Boston Housing 351 Comparison of BatchesMultivariate statistical analysis is concerned with analyzing and understanding data in high dimensions.We suppose that we are given a set xini=1 of n observations of a var

    10、iable vector X in Rp.That is,we suppose that each observation xi has p dimensions:xi=(xi1,xi2,.,xip),and that it is an observed value of a variable vector X Rp.Therefore,X is composed of p random variables:X=(X1,X2,.,Xp)where Xj,for j=1,.,p,is a one-dimensional random variable.1 Comparison of Batche

    11、sMultivariate statistical analysis is concerned with analyzing and understanding data in high dimensions.How do we begin to analyze this kind of data?Before we investigate questions on what inferences we can reach from the data,we should think about how to look at the data.This involves descriptive

    12、techniques.Questions that we could answer by descriptive techniques are:Are there components of X that are more spread out than others?Are there some elements of X that indicate subgroups of the data?Are there outliers in the components of X?How“normal”is the distribution of the data?1.1 Boxplots1 C

    13、omparison of BatchesGenuinedenjuin真正的真正的X6X1The median and mean bars are measures of locations.The relative location of the median(and the mean)in the box is a measure of skewness.The length of the box and whiskers are a measure of spread.The length of the whiskers indicate the tail length of the di

    14、stribution.The outlying points are indicated with a“”or“”depending on if they are outside of FUL 1.5dF or FUL 3dF respectively.The boxplots do not indicate multi modality or clusters.If we compare the relative size and location of the boxes,we are comparing distributions.SummaryReading material21.da

    15、ta capacity21.data capacity数据容量数据容量kpsiti22.data handling22.data handling数据处理数据处理hndli23.data reduction23.data reduction数据缩减分析数据缩减分析ridkn24.data transformation24.data transformation数据变换数据变换25.density function25.density function密度函数密度函数26.description26.description描述描述27.descriptive27.descriptive描述性的描

    16、述性的28.deviation from average28.deviation from average均值离差均值离差,di:viein背离背离29.29.DfDf.Fit.Fit拟合差值拟合差值30.df.(degree of freedom)30.df.(degree of freedom)自由度自由度31.distribution shape31.distribution shape分布形状分布形状eip32.double logarithmic32.double logarithmic双对数双对数,l:grimik33.eigenvector33.eigenvector特征向量特征

    17、向量aign,vekt(r)34.error of estimate34.error of estimate估计误差估计误差estimeit35.estimation35.estimation估计量估计量estimein重音差别重音差别36.Euclidean distance36.Euclidean distance欧式距离欧式距离ju:klidin37.expected value37.expected value期望值期望值ikspektid38.experimental sampling38.experimental sampling实验抽样实验抽样ik,sperimentl s:mp

    18、li39.explanatory variable39.explanatory variable说明变量说明变量iksplntrivribl40.explore Summarize40.explore Summarize探索探索摘要摘要ikspl:smraiz1.2 Histogramsh=0.4DiagonalHistograms are density(denst)(密度密度)estimates(estimeits概算概算).A density estimate gives a good impression of the distribution of the data.In contr

    19、ast to boxplots,density estimates show possible multimodality(多模式;综合多模式;综合,mltimdliti)of the data.The idea is to locally represent the data density by counting the number of observations in a sequence of consecutive(连续的连续的)intervals(bins)(箱箱)with origin(rn起源起源、原点、原点)x0 .Let Bj(x0,h)denote(dinut,指示指示

    20、,表示表示)the bin of length h which is the element of a bin grid starting at x0:Bj(x0,h)=x0+(j 1)h,x0+jh),j Z,where.,.)(square brackets)denotes a left closed and right open interval(ntrvl 间隔间隔,右开区间右开区间).If xin i=1 is an i.i.d.sample with density f,the histogram is defined as follows:In sum(1.7)the first

    21、 indicator function I xi Bj(x0,h)counts the number of observations falling into bin Bj(x0,h).The second indicator function I is responsible for“localizing”(luklizi局限)the counts around x.The parameter h is a smoothing or localizing parameter and controls the width(wid)of the histogram bins.An h that

    22、is too large leads to very big blocks and thus to a very unstructured histogram.On the other hand,an h that is too small gives a very variable estimate with many unimportant peaks.H=0.1H=0.2H=0.3Diagonaldaignladj.对角线对角线的的,斜的斜的 n.对角线对角线,斜线斜线H=0.4The effect of h is given in detail in Figure 1.6.It con

    23、tains the histogram(upper left)for the diagonal of the counterfeit bank notes for x0=137.8(the minimum of these observations)and h=0.1.Increasing h to h=0.2 and using the same origin,x0=137.8,results in the histogram shown in the lower left of the figure.This density histogram is somewhat smoother d

    24、ue to the larger h.The binwidth is next set to h=0.3(upper right).From this histogram,one has the impression that the distribution of the diagonal is bimodal with peaks at about 138.5 and 139.9.The detection of modes requires a fine tuning of the binwidth.Using methods from smoothing methodology(med

    25、ldi,n.方法学方法学)one can find an“optimal”binwidth h for n observations:counterfeitkauntfitadj.假冒的假冒的,假装的假装的In Figure 1.7,we show histograms with x0=137.65(upper left),x0=137.75(lower left),with x0=137.85(upper right),and x0=137.95(lower right).All the graphs have been scaled equally on the y-axis to all

    26、ow comparison.One sees thatdespite the fixed binwidth hthe interpretation is not facilitated(fsiliteitid vt.使容易使容易).The shift of the origin x0(to 4 different locations)created 4 different histograms.This property of histograms strongly contradicts the goal of presenting data features.Modes of the de

    27、nsity are detected with a histogram.Modes correspond to strong peaks in the histogram.Histograms with the same h need not be identical.They also depend on the origin x0 of the grid.The influence of the origin x0 is drastic.Changing x0 creates different looking histograms.The consequence of an h that

    28、 is too large is an unstructured histogram that is too flat.A bin width h that is too small results in an unstable histogram.There is an“optimal”h=(24 /n)1/3.It is recommended to use averaged histograms.They are kernel densities.Summary1.4 ScatterplotsScatterplots are bivariate or trivariate plots o

    29、f variables(vribl)against each other.They help us understand relationships among the variables of a data set.A downward-sloping(slupi)scatter indicates that as we increase the variable on the horizontal axis,the variable on the vertical axis decreases (di:kri:s vt.减少减少).An analogous(nlgs adj.类似的类似的)

    30、statement can be made for upward-sloping scatters.Figure 1.12 plots the 5th column(upper inner frame)of the bank data against the 6th column(diagonal).The scatter is downward-sloping.As we already know from the previous section on marginal comparison a good separation between genuine and counterfeit

    31、 bank notes is visible for the diagonal variable.The sub-cloud in the upper half(circles)of Figure 1.12 corresponds to the true bank notes.As noted before,this separation is not distinct(adj.清楚的、明显清楚的、明显),since the two groups overlap(,uvlp vt.重叠重叠)somewhat.Draftman绘图员 Scatterplots in two and three d

    32、imensions helps in identifying separated points,outliers or sub-clusters.Scatterplots help us in judging positive or negative dependencies.Draftman scatterplot matrices help detect structures conditioned on values of other variables.As the brush of a scatterplot matrix moves through a point cloud,we

    33、 can study conditional dependence.Summary1.8 Data Set Boston Housing Data SetVariablevribladj.可变的可变的,易易变的变的,不定的不定的n.变量变量,可变物可变物 First Step:New Words第一类第一类 高频词高频词 160个个1.absolute deviation1.absolute deviation绝对离差绝对离差bslu:t,di:viein2.absolute residuals2.absolute residuals绝对残差绝对残差rezidju:l3.among group

    34、s3.among groups组间组间gru:p4.analysis of correlation4.analysis of correlation相关分析相关分析nlsis,krlein5.analysis of covariance5.analysis of covariance协方差分析协方差分析kuvrins6.analysis of regression6.analysis of regression回归分析回归分析rigren7.Bayesian estimation7.Bayesian estimationBeyes Beyes 估计估计beisestimein8.8.bivar

    35、iatebivariate双变量的双变量的baivriit9.bivariate Correlate9.bivariate Correlate二变量相关二变量相关10.boxplot10.boxplot箱线图箱线图11.canonical correlation11.canonical correlation典型相关典型相关knnikl12.categorical variable12.categorical variable分类变量分类变量,ktigriklvribl13.central tendency13.central tendency集中趋势集中趋势sentrltendnsi14.c

    36、hance statistics14.chance statistics随机统计量随机统计量tns;t:ns sttistiks15.chance variable15.chance variable随机变量随机变量16.classified variable16.classified variable分类变量分类变量klsifaid17.coefficient of skewness17.coefficient of skewness偏度系数偏度系数kuifintskju:nes18.confidence limit18.confidence limit置信限置信限knfidnslimit1

    37、9.cumulative probability19.cumulative probability累计概率累计概率kju:mjultiv,prbbiliti20.curvature20.curvature曲率曲率k:vt21.data capacity数据容量22.data handling数据处理23.data reduction数据缩减分析24.data transformation数据变换25.density function密度函数26.description描述27.descriptive描述性的28.deviation from average离均差29.Df.Fit拟合差值30.

    38、df.(degree of freedom)自由度31.distribution shape分布形状32.double logarithmic双对数33.eigenvector特征向量34.error of estimate估计误差35.estimation估计量36.Euclidean distance欧式距离37.expected value期望值38.experimental sampling实验抽样39.explanatory variable说明变量40.explore Summarize探索摘要41.extreme value41.extreme value极值极值ikstri:m

    39、vlju:42.factor score42.factor score因子得分因子得分fktsk:43.factorial designs43.factorial designs因子设计因子设计fkt:rildizain44.factorial experiment44.factorial experiment因子实验因子实验fkt:riliksperimnt45.finite population45.finite population有限总体有限总体fainait,ppjulein46.finite-sample46.finite-sample有限样本有限样本smpl47.F-test47

    40、.F-testF F检验检验test48.function48.function函数函数fkn49.function relationship49.function relationship函数关系函数关系fknrileinip50.gamma distribution50.gamma distribution伽马分布伽马分布gm,distribju:n51.geometric mean51.geometric mean几何均值几何均值dimetrik mi:n52.goodness-of-fit52.goodness-of-fit拟合优度拟合优度gudnisfit53.group avera

    41、ges53.group averages分组平均分组平均gru:pvrid54.grouped data54.grouped data分组资料分组资料deit55.grouped median55.grouped median组中值组中值mi:din56.hypothesis56.hypothesis假设假设haipisis57.hypothesis test57.hypothesis test假设检验假设检验haipisistest58.hypothetical universe58.hypothetical universe假设总体假设总体haipuetiklju:niv:s59.impo

    42、ssible event59.impossible event不可能事件不可能事件 impsblivent60.independent samples60.independent samples独立样本独立样本,indipendnt smpl61.independent variable61.independent variable自变量自变量vribl62.infinitely great62.infinitely great无穷大无穷大infinitligreit63.interclass correlation63.interclass correlation组内相关组内相关intkl:s,k:rilein64.inter-item correlation64.inter-item correlation样本内相关样本内相关 aitm,k:rilein65.item means65.item means样本均值样本均值aitmmi:n

    展开阅读全文
    提示  163文库所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
    关于本文
    本文标题:多元统计分析课件.ppt
    链接地址:https://www.163wenku.com/p-4978031.html

    Copyright@ 2017-2037 Www.163WenKu.Com  网站版权所有  |  资源地图   
    IPC备案号:蜀ICP备2021032737号  | 川公网安备 51099002000191号


    侵权投诉QQ:3464097650  资料上传QQ:3464097650
       


    【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。

    163文库