A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP规划与执行一个分层的方法.ppt

上传人（卖家）：三亚风情

文档编号：3228742

上传时间：2022-08-08

格式：PPT

页数：61

大小：676.52KB

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

28 文币

交易提醒：下载本文档，相应价格的文币将全额进入上传人（卖家）的账号。立即下载优惠套餐（点此详情）

【下载声明】
1. 本站全部试题类文档，若标题没写含答案，则无答案；标题注明含答案的文档，主观题也可能无答案。请谨慎下单，一旦售出，不予退换。
2. 本站全部PPT文档均不含视频和音频，PPT中出现的音频或视频标识（或文字）仅表示流程，实际无音频或视频文件。请谨慎下单，一旦售出，不予退换。
3. 本页资料《A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP规划与执行一个分层的方法.ppt》由用户（三亚风情）主动上传，其收益全归该用户。163文库仅提供信息存储空间，仅对该用户上传内容的表现方式做保护处理，对上传内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知163文库（点击联系客服），我们立即给予删除！
4. 请根据预览情况，自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器，压缩文件请下载最新的WinRAR软件解压。

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: Hierarchical Approach to POMDP Planning and ExecutionPOMDP 规划执行一个分层方法

资源描述：: 1、Hierarchical Methods forPlanning under UncertaintyThesis ProposalJoelle PineauThesis Committee:Sebastian Thrun,ChairMatthew MasonAndrew MooreCraig Boutilier,U.of TorontoJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyIntegrating robots in living environmentsThe robots
2、 role:-Social interaction-Mobile manipulation-Intelligent reminding-Remote-operation-Data collection/monitoringJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyA broad perspectiveGOAL=Selecting appropriate actionsUSER+WORLD+ROBOTACTIONS OBSERVATIONSBeliefstateSTATEJoel
3、le PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyCause#1:Non-deterministic effects of actionsCause#2:Partial and noisy sensor informationCause#3:Inaccurate model of the world and the userWhy is this a difficult problem?UNCERTAINTYJoelle PineauThesis Proposal:Hierarchical M
4、ethods for Planning under UncertaintyCause#1:Non-deterministic effects of actionsCause#2:Partial and noisy sensor informationCause#3:Inaccurate model of the world and the userWhy is this a difficult problem?UNCERTAINTYA solution:Partially Observable MarkovDecision Processes(POMDPs)S3o1,o2S1o1,o2S2o1
5、,o2a1a2Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe truth about POMDPs Bad news:Finding an optimal POMDP action selection policy is computationally intractable for complex problems.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under Uncertainty
6、The truth about POMDPs Bad news:Finding an optimal POMDP action selection policy is computationally intractable for complex problems.Good news:Many real-world decision-making problems exhibit structure inherent to the problem domain.By leveraging structure in the problem domain,I propose an algorith
7、m that makes POMDPs tractable,even for large domains.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyHow is it done?Use a“Divide-and-conquer”approach:We decompose a large monolithic problem into a collection of loosely-related smaller problems.DialoguemanagerHealthman
8、agerSocialmanagerRemindingmanagerJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThesis statementDecision-making under uncertaintycan be made tractable for complex problemsby exploiting hierarchical structurein the problem domain.Joelle PineauThesis Proposal:Hierarchi
9、cal Methods for Planning under UncertaintyOutline Problem motivationPartially observable Markov decision processes The hierarchical POMDP algorithm Proposed researchJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyPOMDPs within the family of Markov modelsMarkov ChainHi
10、dden Markov Model(HMM)Markov Decision Process(MDP)Partially Observable MDP(POMDP)Uncertainty in sensor input?nonoControlproblem?yesyesJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyPOMDP parameters:Initial belief:b0(s)=Pr(so=s)Observation probabilities:O(s,a,o)=Pr(o|
11、s,a)Transition probabilities:T(s,a,s)=Pr(s|s,a)Rewards:R(s,a)HMMWhat are POMDPs?Components:Set of states:sSSet of actions:aASet of observations:oO 0.50.51MDPS2Pr(o1)=0.9Pr(o2)=0.1S1Pr(o1)=0.5Pr(o2)=0.5a1a2S3Pr(o1)=0.2Pr(o2)=0.8Joelle PineauThesis Proposal:Hierarchical Methods for Planning under Unce
12、rtaintyA POMDP example:The tiger problemS1“tiger-left”Pr(o=growl-left)=0.85Pr(o=growl-right)=0.15S2“tiger-right”Pr(o=growl-left)=0.15Pr(o=growl-right)=0.85Actions=listen,open-left,open-rightReward Function:R(a=listen)=-1R(a=open-right,s=tiger-left)=10R(a=open-left,s=tiger-left)=-100Joelle PineauThes
13、is Proposal:Hierarchical Methods for Planning under UncertaintyWhat can we do with POMDPs?1)State tracking:After an action,what is the state of the world,st?2)Computing a policy:Which action,aj,should the controller apply next?Very hard!Not so hard.bt-1?at-1otRobot:St-1stWorld:Control layer:.?Joelle
14、 PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:State trackingS1“tiger-left”S2“tiger-right”Belief vectorb0BeliefJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:State trackingS1“tiger-left”S2“tiger-right”Bel
15、ief vectorb0Beliefobs=growl-leftaction=listenJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:State trackingb1obs=growl-leftS1“tiger-left”S2“tiger-right”Belief vectorBeliefb0action=listen baoPsbassPasoPsbSsjjiiij,|,|,|01Joelle PineauThesis Proposal:Hi
16、erarchical Methods for Planning under UncertaintyPolicy OptimizationWhich action,aj,should the controller apply next?In MDPs:Policy is a mapping from state to action,:si aj In POMDPs:Policy is a mapping from belief to action,:b ajRecursively calculate expected long-term reward for each state/belief:
17、Find the action that maximizes the expected reward:)(),|Pr(),(max)(1jNjijiaisVassasRsV)(),|Pr(),(maxarg)(1jNjijiaisVassasRsJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:Optimal policyBelief vector:open-leftopen-rightlistenS1“tiger-left”S2“tiger-rig
18、ht”Optimal policy:Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyFinite-horizon POMDPs are in worse-case doubly exponential:Infinite-horizon undiscounted stochastic POMDPs are EXPTIME-hard,and may not be decidable(|n|).POMDPComplexity(per step ofvalue iteration)MDPre
19、cursiveupper-boundTimeSpaceComplexity of policy optimizationnOAS|2|nOA|12|OnAS|1|OnA|2AS|S|nJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe essence of the problem How can we find good policies for complex POMDPs?Is there a principled way to provide near-optimal po
20、licies in reasonable time?Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyOutline Problem motivation Partially observable Markov decision processesThe hierarchical POMDP algorithm Proposed researchJoelle PineauThesis Proposal:Hierarchical Methods for Planning under Un
21、certaintyA hierarchical approach to POMDP planningKey Idea:Exploit hierarchical structure in the problem domain to break a problem into many“related”POMDPs.What type of structure?Action set partitioningActInvestigateHealthMoveNavigateCheckPulseAskWhereLeft Right Forward BackwardCheckMedssubtaskabstr
22、actactionJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyAssumptionsEach POMDP controller has a subset of Ao.Each POMDP controller has full state set S0,observation set O0.Each controller includes discriminative reward information.We are given the action set partition
23、ing graph.We are given a full POMDP model of the problem:So,Ao,Oo,Mo.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:An action hierarchyPinvestigate=S0,Ainvestigate,O0,MinvestigateAinvestigate=listen,open-rightactopen-leftinvestigateopen-rightlistenJ
24、oelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyOptimizing the“investigate”controllerS1“tiger-left”S2“tiger-right”Locally optimal policy:Belief vector:open-rightlistenJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:An a
25、ction hierarchyPact=S0,Aact,O0,MactAact=open-left,investigateactopen-leftinvestigateopen-rightlistenBut.R(s,a=investigate)is not defined!Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyModeling abstract actionsInsight:Use the local policy of corresponding low-level co
26、ntroller.General form:R(si,ak)=R(si,Policy(controllerk,si)Example:R(s=tiger-left,ak=investigate)=open-right listen open-lefttiger-left 10-1 -100tiger-right -100-1 10Policy(investigate,s=tiger-left)=open-rightJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyOptimizing t
27、he“act”controllerS1“tiger-left”S2“tiger-right”Locally optimal policy:investigateBelief vector:open-leftJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe complete hierarchical policyS1“tiger-left”S2“tiger-right”Hierarchical policy:Belief vector:open-leftopen-rightlis
28、tenJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe complete hierarchical policyS1“tiger-left”S2“tiger-right”Hierarchical policy:open-leftopen-rightlistenOptimal policy:Belief vector:Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyRe
29、sults for larger simulation domainsPOMDPH-POMDPMDPNavigation Problem:|S|=11,|A|=6,|O|=6 CPU Time(secs):1119.932.840.000654 Average Reward:12.512.20.0Dialogue Problem:|S|=20,|A|=30,|O|=27 CPU Time(secs):24hrs77.996.46 Average Reward:64.4353.33%Correct actions:93.280.0Joelle PineauThesis Proposal:Hier
30、archical Methods for Planning under UncertaintyRelated work on hierarchical methodsHierarchical HMMsFine et al.,2019Hierarchical MDPsDayan&Hinton,1993;Dietterich,2019;McGovern et al.,2019;Parr&Russell,2019;Singh,1992.Loosely-coupled MDPsBoutilier et al.,2019;Dean&Lin,2019;Meuleau et al.2019;Singh&Co
31、hn,2019;Wang&Mahadevan,2019.Factored state POMDPsBoutilier et al.,2019;Boutilier&Poole,2019;Hansen&Feng,2000.Hierarchical POMDPsCastanon,2019;Hernandez-Gardiol&Mahadevan,2019;Theocharous et al.,2019;Wiering&Schmidhuber,2019.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under Uncerta
32、intyOutline Problem motivation Partially observable Markov decision processes The hierarchical POMDP algorithmProposed researchJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyProposed research1)Algorithmic design2)Algorithmic analysis3)Model learning4)System developme
33、nt and application Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyResearch block#1:Algorithmic designGoal 1.1:Developing/implementing hierarchical POMDP algorithm.Goal 1.2:Extending H-POMDP for factorized state representation.Goal 1.3:Using state/observation abstract
34、ion.Goal 1.4:Planning for controllers with no local reward information.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyAssumption#2:“Each POMDP controller has full state set S0,and observation set O0.”Can we reduce the number of states/observations,|S|and|O|?Goal 1.3:
35、State/observation abstractionJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyAssumption#2:“Each POMDP controller has full state set S0,and observation set O0.”Can we reduce the number of states/observations,|S|and|O|?Yes!Each controller only needs subset of state/obse
36、rvation features.What is the computational speed-up?Goal 1.3:State/observation abstractionNavigateLeftRight Forward BackwardInvestigateHealthCheckPulseCheckMeds POMDP recursiveupper-boundTime complexity:nOAS|2|12|OnASJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyGoa
37、l 1.4:Local controller reward informationAssumption#3:“Each controller includes some amount of discriminative reward information.”Can we relax this assumption?Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyGoal 1.4:Local controller reward informationAssumption#3:“Eac
38、h controller includes some amount of discriminative reward information.”Can we relax this assumption?Possibly.Use reward shaping to select policy-invariant reward function.What is the benefit?H-POMDP could solve problems with sparse reward functions.Joelle PineauThesis Proposal:Hierarchical Methods
39、for Planning under UncertaintyResearch block#2:Algorithmic analysisGoal 2.1:Evaluating performance of the H-POMDP algorithm.Goal 2.2:Quantifying the loss due to the hierarchy.Goal 2.3:Comparing different possible decompositions of a problem.Joelle PineauThesis Proposal:Hierarchical Methods for Plann
40、ing under UncertaintyGoal 2.1:Performance evaluationHow does the hierarchical POMDP algorithm compare to:Exact value function methods Sondik,1971;Monahan,1982;Littman,2019;Cassandra et al,2019.Policy search methods Hansen,2019;Kearns et al.,2019;Ng&Jordan,2000;Baxter&Bartlett,2000.Value approximatio
41、n methods Parr&Russell,2019;Thrun,2000.Belief approximation methods Nourbakhsh,2019;Koenig&Simmons,2019;Hauskrecht,2000;Roy&Thrun,2000.Memory-based methods McCallum,2019.Consider problems from POMDP literature and dialogue management domain.Joelle PineauThesis Proposal:Hierarchical Methods for Plann
42、ing under UncertaintyGoal 2.2:Quantifying the lossThe hierarchical POMDP planning algorithm provides an approximately-optimal policy.How“near-optimal”is the policy?Subject to some(very restrictive)conditions:“The value function of top-level controlleris an upper-bound on the valueof the approximatio
43、n.”Can we loosen the restrictions?Tighten the bound?Find a lower-bound?AtopA1.Vtop(b)Vactual(b)Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyGoal 2.3:Comparing different decompositionAssumption#4:“We are given an action set partitioning graph.”What makes a good hier
44、archical action decomposition?Comparing decompositions is the first step towards automatic decomposition.ManufactureExamineInspectReplacea1a2ManufactureReplaceExamine Inspecta1a2a3Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyResearch block#3:Model learningGoal 3.1:
45、Automatically generating good action hierarchies.Assumption#4:“We are given an action set partitioning graph.”Can we automatically generate a good hierarchical decomposition?Maybe.It is being done for hierarchical MDPs.Goal 3.2:Including parameter learning.Assumption#5:“We are given a full POMDP mod
46、el of the problem.”Can we introduce parameter learning?Yes!Maximum-likelihood parameter optimization(Baum-Welch)can be used for POMDPs.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyTouchscreen inputSpeech utteranceResearch block#4:System development and applicationG
47、oal 4.1:Building an extensive dialogue manager Touchscreen messageSpeech utteranceDialogue ManagerRemindermessageRobot sensor readingsMotion commandStatus informationFacemail operationsRobot moduleReminding moduleTeleoperation moduleUserRemote-controlcommandJoelle PineauThesis Proposal:Hierarchical
48、Methods for Planning under UncertaintyAn implemented scenarioPhysiotherapyPatientroomRobothomeProblem size:|S|=288,|A|=14,|O|=15State Features:RobotLocation,UserLocation,UserStatus,ReminderGoal,UserMotionGoal,UserSpeechGoalTest subjects:3 elderly residents in assisted living facilityJoelle PineauThe
49、sis Proposal:Hierarchical Methods for Planning under UncertaintyContributions Algorithmic contribution:A novel POMDP algorithm based on hierarchical structure.Enables use of POMDPs for much larger problems.Application contribution:Application of POMDPs to dialogue management is novel.Allows design o
50、f robust robot behavioural managers.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyResearch schedule1)Algorithmic design/implementation2)Algorithmic analysis3)Model learning4)System development and application5)Thesis writingfall 01spring/summer 02spring/summer/fall

展开阅读全文

163文库所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP规划与执行一个分层的方法.ppt
链接地址：https://www.163wenku.com/p-3228742.html

三亚风情

内容提供者

实名认证

联系作者