大规模视频数据复杂事件检测课件.pptx
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《大规模视频数据复杂事件检测课件.pptx》由用户(三亚风情)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 大规模 视频 数据 复杂 事件 检测 课件
- 资源描述:
-
1、大规模视频数据复杂事件检测Outline Introduction Standard pipeline MED with few exemplars A discriminative CNN representation for MED A new pooling method for MED Introduction Challenge 1:An event is usually characterized by a longer video clip.10 years ago:Constrained videos,e.g.,New videosNow:Unconstrained video
2、sThe length of videos in the TRECVID MED dataset varies from one min to one hourThe videos are unconstrained Introduction(Contd)Challenge 2:Multimedia events are higher level descriptions.landing a fishIntroduction(Contd)Challenge 3:Huge intra-classvariationsVideo 1Video 2Marriage proposalOutline In
3、troduction Standard pipeline MED with few exemplars A Discriminative CNN representation for MED A new pooling method for MED Standard Components in CDR PipelinePhaseProcessVisual AnalysisSIFTColor SIFT(CSIFT)Transformed Color Histogram(TCH)Motion SIFT(MoSIFT)STIPDense Trajectory CNN Audio AnalysisMF
4、CCAcoustic Unit Descriptors(AUDs)Text AnalysisOCRASRHigh Level Concept AnalysisSIN 11 ConceptsObject BankVideoLegend:ProcessObjectCDRVisual AnalysisLow-Level Feature VectorsAudio AnalysisText Analysis7High Level Concept AnalysisOutline Introduction Standard pipeline MED with few exemplars A Discrimi
5、native CNN representation for MED A new pooling method for MED Motivation There are three tasks in MED EK 100(100 positive exemplars per event)EK 10(10 positive exemplars per event)EK 0(No positive exemplar but only text descriptions)Solution for event detection with few(i.e.,10)exemplars Knowledge
6、adaptation Related exemplars Leveraging related videosA video related to“marriage proposal.”A girl plays music,dances down a hallway in school,and asks a boy to prom.A video related to“marriage proposal.”A large crowd cheers after a boy asks his girlfriend to go to prom with him with a bouquet of fl
7、owers and a huge sign.Our solution Automatically access the relatedness of each related videos for event detection.Experiment ResultsThe frames sampled from two video sequences marked as related exemplars to the event“birthday party”by the NIST.Experiment ResultsThe frames sampled from two video seq
8、uences marked as related to the event“town hall meeting”by NIST.Experiment ResultsTake home messages Exact positive training exemplars are difficult to obtain,but related samples are easier to obtain Appropriately leveraging related samples would help event detection The performance is more signific
9、ant when the exact positive exemplars are few There are also many other cases where related samples are largely available.For details,refer to our paper How Related Exemplars Help Complex Event Detection in Web Videos?Yi Yang,Zhigang Ma,Zhongwen Xu,Shuicheng Yan and Alexander Hauptmann.ICCV 2013 Out
10、line Introduction Standard CDR MED with few exemplars A Discriminative CNN representation for MED A new pooling method for MED Video analysis costs a lot Dense Trajectories and its enhanced version improved Dense Trajectories(IDT)have dominated complex event detection superior performance over other
11、 features such as the motion feature STIP and the static appearance feature Dense SIFTCredits:Heng WangVideo analysis costs a lot Paralleling 1,000 cores,it takes about one week to extract the IDT features for the 200,000 videos with duration of 8,000 hours in the TRECVID MEDEval 14 collectionVideo
12、analysis costs a lot As a result of the unaffordable computation cost(a cluster with 1,000 cores),it would be extremely difficult for a relatively smaller research group with limited computational resources to process large scale MED datasets.It becomes important to propose an efficient representati
13、on for complex event detection with only affordable computational resources,e.g.,a single machine.Turn to CNN?One instinctive idea would be to utilize the deep learning approach,especially Convolutional Neural Networks(CNNs),given their overwhelming accuracy in image analysis and fast processing spe
14、ed,which is achieved by leveraging the massive parallel processing power of GPUs.Turn to CNN?However,it has been reported that the event detection performance of CNN based video representation is worse than the improved Dense Trajectories in TRECVID MED 2013.Technical problems of utilizing CNNs for
15、MED Firstly,CNN requires a large amount of labeled video data to train good models from scratch.TRECVID MED datasets have only 100 positive examples for each event.Secondly,fine-tuning from ImageNet to video data needs to change the structure of the networks e.g.convolutional pooling layer proposed
16、in Beyond Short Snippets:Deep Networks for Video Classification Finally,average pooling from the frames to generate the video representation is not effective for CNN features.Cont.Average Pooling for VideosWinning solution for the TRECVID MED 2013 competitionAverage Pooling of CNN frame features Con
17、volutional Neural Networks(CNNs)with standard approach(average pooling)to generate video representation from frame level featuresMEDTest 13MEDTest 14Improved Dense Trajectories34.027.6CNN in CMUMED 201329.0N.A.CNN from VGG-1632.724.8Video Pooling on CNN Descriptors Video pooling computes video repre
展开阅读全文