Lecture5缺失值处理策略课件.pptx
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《Lecture5缺失值处理策略课件.pptx》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Lecture5 缺失 处理 策略 课件
- 资源描述:
-
1、Outline of the problemMissing values in longitudinal trials is a big issueFirst aim should be to reduce proportionEthics dictate that it cant be avoidedThere is no magic method to fix itMagnitude of problem varies across areas8-week depression trial:25%50%may drop out by final visit12-week asthma tr
2、ial:maybe only 5%10%1 DateName,department2 Outline of the lecturePart I:Missing dataPart II:Multiple imputationExample:The analgesic trial3 4 DateName,department5 Part I:Missing dataIn real datasets,like,e.g.,surveys and clinical trials,it is quite common to have observations with missing values for
3、 one or more input features.The first issue in dealing with the problem is determining whether the missing data mechanism has distorted the observed data.Little and Rubin(1987)and Rubin(1987)distinguish between basically three missing data mechanisms.Data are said to be missing at random(MAR)if the
4、mechanism resulting in its omission is independent of its(unobserved)value.If its omission is also independent of the observed values,then the missingness process is said to be missing completely at random(MCAR).In any other case the process is missing not at random(MNAR),i.e.,the missingness proces
5、s depends on the unobserved values.http:/www.emea.europa.eu/pdfs/human/ewp/177699EN.pdf1.Introduction to missing data?Variables Cases?=missing6 What is missing data?The missingness hides a real value that is useful for analysis purposes.Survey questions:1.What is your total annual income for FY 2008
6、?2.Who are you voting for in the 2009 election for the European parlament?7 What is missing data?Clinical trials:StartFinishcensored at this point in timetime8 MissingnessIt matters why data are missing.Suppose you are modelling weight(Y)as a function of sex(X).Some respondents wouldnt disclose thei
7、r weight,so you are missing some values for Y.There are three possible mechanisms for the nondisclosure:1.There may be no particular reason why some respondents told you their weights and others didnt.That is,the probability that Y is missing may has no relationship to X or Y.In this case our data i
8、s missing completely at random2.One sex may be less likely to disclose its weight.That is,the probability that Y is missing depends only on the value of X.Such data are missing at random3.Heavy(or light)people may be less likely to disclose their weight.That is,the probability that Y is missing depe
9、nds on the unobserved value of Y itself.Such data are not missing at random9 Missing data patterns&mechanisms Pattern:Which values are missing?Mechanism:Is missingness related to the response?(Yi,Ri)=Data matrix,with COMPLETE DATARij=1,Yij missing0,Yij observedRij=Missing data indicator matrix=Obser
10、ved part of Y=Missing part of Y0YimiY10 Missing data patterns&mechanisms“Pattern”concerns the distribution of R“Mechanism”concerns the distribution of R given YRubin(Biometrika 1976)distinguishes between:Missing Completely at Random(MCAR)P(R|Y)=P(R)for all Y Missing at Random(MAR)P(R|Y)=P(R|)for all
11、 Not Missing at Random(NMAR)P(R|Y)depends on0YmYmY11 Missing At Random(MAR)What are the most general conditions under which a valid analysis can be done using only the observed data,and no information about the missingness value mechanism,The answer to this is when,given the observed data,the missin
12、gness mechanism does not depend on the unobserved data.Mathematically,This is termed Missing At Random,and is equivalent to saying that the behaviour of two units who share observed values have the same statistical behaviour on the other observations,whether observed or not.)Y,Y|P(Rom)Y|P(R)Y,Y|P(Ro
13、om12 As units 1 and 2 have the same values where both are observed,given these observed values,under MAR,variables 3,5 and 6 from unit 2 have the same distribution(NB not the same value!)as variables 3,5 and 6 from unit 1.Note that under MAR the probability of a value being missing will generally de
14、pend on observed values,so it does not correspond to the intuitive notion of random.The important idea is that the missing value mechanism can be expressed solely in terms of observations that are observed.Unfortunately,this can rarely be definitively determined from the data at hand!Example13 If da
15、ta are MCAR or MAR,you can ignore the missing data mechanism and use multiple imputation and maximum likelihood.If data are NMAR,you cant ignore the missing data mechanism;two approaches to NMAR data are selection models and pattern mixture.14 Suppose Y is weight in pounds;if someone has a heavy wei
16、ght,they may be less inclined to report it.So the value of Y affects whether Y is missing;the data are NMAR.Two possible approaches for such data are selection models and pattern mixture.Selection models.In a selection model,you simultaneously model Y and the probability that Y is missing.Unfortunat
17、ely,a number of practical difficulties are often encountered in estimating selection models.Pattern mixture(Rubin 1987).When data is NMAR,an alternative to selection models is multiple imputation with pattern mixture.In this approach,you perform multiple imputations under a variety of assumptions ab
18、out the missing data mechanism.In ordinary multiple imputation,you assume that those people who report their weights are similar to those who dont.In a pattern-mixture model,you may assume that people who dont report their weights are an average of 20 pounds heavier.This is of course an arbitrary as
19、sumption;the idea of pattern mixture is to try out a variety of plausible assumptions and see how much they affect your results.Pattern mixture is a more natural,flexible,and interpretable approach.15 Simple analysis strategies(1)Complete Case(CC)analysisAdvantages:Complete Cases?discardEasy Does no
展开阅读全文