生物资讯相关演算法AlgorithmsinBioinformatics-课件.ppt
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《生物资讯相关演算法AlgorithmsinBioinformatics-课件.ppt》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 生物 资讯 相关 演算法 AlgorithmsinBioinformatics 课件
- 资源描述:
-
1、n呂學一(中央研究院 資訊科學所)nwww.iis.sinica.edu.tw/hil/hil/nDont forget to do Homework 2!nConstructing suffix tree in linear time.nApplications of suffix trees Substring problem(暖身)“Exact string matching”revisited Linearization of circular string(挪移乾坤)Longest common substring(異中求同)nIntermission 小巨s magic show:
2、“flaming”nCase-1 StepnAlways occurs at a leaf(growing point).nAt the beginning of iteration k,the path above a leaf has label?,k1.nEach Case-1 step in iteration k simply changes the label?,k1 to?,k.?,k1?,knUse?,-to label the path above each leaf.Thus,no need to do anything for each case-1 step.?,-nC
3、ase-2 Steps:節外生枝nAlways occurs at a growing point that is not a leaf,which is not necessarily an internal node.nWhen the growing point is an internal node Label=k,-,where k is the current iteration index.k,-nWhen the growing point is not an internal node k,-ti,ji+t,ji,i+t-1nCase-3 Steps:若無其事nAlways
4、occurs at a growing point that is not a leaf,which is not necessarily an internal node.nWhen the new position of the growing point is an internal node ti,j0nWhen the new position of the growing point is not an internal node ti,jt+1S=b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,11 2111nWe
5、 can simply ignore all Case-1 steps.nRecall that the number of Case-2 steps is at most|S|.nQ:Is this good enough?Case 1:長此以往Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nGive a sequence S such that the number of Case-3 steps throughout
6、the process of growing its suffix trie(or suffix tree)is W(|S|2).nCompletely ignore themn以若無其事的態度處理若無其事的狀況ti,j0ti,jt+11.For correctly maintaining the position of each growing point.(Why?)2.For correctly running Case-2 steps.(By what?)k,-ti,ji+t,ji,i+t-1nSaving the book keeping efforts in all Case-3
7、steps Case 1:長此以往Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nJust keep one current growing point throughout the execution.nDeriving the new position of the current growing point from its previous position(with the help of suffix links
8、(斷頭指標)Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b The challenges:How do we derive the position of the current growing point?Vertically(case 2)Horizontally(case 3)nQ:Which one is easier?nMoving from iteration k 1 to iteration k.nThe gro
9、wing point does not move!nThis is the easier case.Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b 1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1:長此以往Case 2:節外生枝Case 3:若無其事nMoving from Step i t
10、o Step i+1 in the same iteration.nThe growing point moves dramatically.nThis is the tougher case.Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b 1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1:
11、長此以往Case 2:節外生枝Case 3:若無其事n前人種樹後人涼的哲學n每次千辛萬苦找到vertical movement的目的時,把這個movement的起點與終點用一個link記錄下來.n下回遇到這個起點時,就可以直接走到終點去,不用再重新找一次.n這些link就叫做suffix link(斷頭指標).n終點所對應的字串,是起點所對應之字串的斷頭字串(second suffix)Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a
12、b b n每個斷頭指標的起點一定是個internal node,不會葉子 不會是某個suffix tree edge的中間nWhy?Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b n每個internal node一定是某個斷頭指標的起點nWhy?Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a
13、 a b a a b a b b S=b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,1121111 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b n遇到必須節外生枝(case 2),就在長好新樹枝之後,移到斷頭位置,繼續從那裡檢查目前這個Sk是case 2還是case 3.n遇到若無其事(case 3),就直接按照Sk的方向移動,繼續從那裡檢查下一個字元Sk+1是case 2還是case 3.11,13,
14、1,12,3,127,2,34,7,14,17,3,34,1210,4,56,110,2,23,323453,3nGoing up to a closest internal node(whose suffix link must be available).Suppose this upward traversal passes through t characters.nFollowing the suffix link that starts from this internal node.ti,jnGoing down by matching the t-character subst
15、ring Si,i+t 1 of S.ti,jnNavely:O(t).nCleverly:O(1+d),where d is the number of internal nodes being went through during phase(2).ti,jnSuppose di is the d in the i-th Case-2-step traversal.nIt suffices to show d1+d2+d|S|=O(|S|).Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a
16、 a b b b a a b b a a b a a b a b b nThe slack means the distance between a position P and the closest internal node above P.ti,jnEach case-3 traversal(i.e.,horizontal movement)can only increase the value of by at most one.n(It can even decrease the value of.)Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b
17、a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nThe i-th case-2 traversal(i.e.,vertical movement)decreases the value of by at least di.Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nInitial =O(1).n can be increased b
18、y one for at most|S|times(because there are at most|S|horizontal movements(i.e.,case-3 traversals).nSince is always non-negative,the above bound is proved.nConstructing suffix tree in linear time.nApplications of suffix trees Substring problem(暖身)“Exact string matching”revisited Linearization of cir
19、cular string(挪移乾坤)Longest common substring(異中求同)nIntermission 小巨s magic shownSubstring Problem(recap as a warm-up)nInput:two strings P and S,where S is allowed to be preprocessed in O(|S|)time.nOutput:an occurrence of P in S.nObjective:done in O(|P|)time.1,13,1,12,3,127,2,34,7,14,17,3,34,13,31,17,2,
展开阅读全文