书签 分享 收藏 举报 版权申诉 / 95
上传文档赚钱

类型生物资讯相关演算法AlgorithmsinBioinformatics-课件.ppt

  • 上传人(卖家):晟晟文业
  • 文档编号:5205248
  • 上传时间:2023-02-17
  • 格式:PPT
  • 页数:95
  • 大小:735.06KB
  • 【下载声明】
    1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
    2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
    3. 本页资料《生物资讯相关演算法AlgorithmsinBioinformatics-课件.ppt》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
    4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
    5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
    配套讲稿:

    如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。

    特殊限制:

    部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。

    关 键  词:
    生物 资讯 相关 演算法 AlgorithmsinBioinformatics 课件
    资源描述:

    1、n呂學一(中央研究院 資訊科學所)nwww.iis.sinica.edu.tw/hil/hil/nDont forget to do Homework 2!nConstructing suffix tree in linear time.nApplications of suffix trees Substring problem(暖身)“Exact string matching”revisited Linearization of circular string(挪移乾坤)Longest common substring(異中求同)nIntermission 小巨s magic show:

    2、“flaming”nCase-1 StepnAlways occurs at a leaf(growing point).nAt the beginning of iteration k,the path above a leaf has label?,k1.nEach Case-1 step in iteration k simply changes the label?,k1 to?,k.?,k1?,knUse?,-to label the path above each leaf.Thus,no need to do anything for each case-1 step.?,-nC

    3、ase-2 Steps:節外生枝nAlways occurs at a growing point that is not a leaf,which is not necessarily an internal node.nWhen the growing point is an internal node Label=k,-,where k is the current iteration index.k,-nWhen the growing point is not an internal node k,-ti,ji+t,ji,i+t-1nCase-3 Steps:若無其事nAlways

    4、occurs at a growing point that is not a leaf,which is not necessarily an internal node.nWhen the new position of the growing point is an internal node ti,j0nWhen the new position of the growing point is not an internal node ti,jt+1S=b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,11 2111nWe

    5、 can simply ignore all Case-1 steps.nRecall that the number of Case-2 steps is at most|S|.nQ:Is this good enough?Case 1:長此以往Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nGive a sequence S such that the number of Case-3 steps throughout

    6、the process of growing its suffix trie(or suffix tree)is W(|S|2).nCompletely ignore themn以若無其事的態度處理若無其事的狀況ti,j0ti,jt+11.For correctly maintaining the position of each growing point.(Why?)2.For correctly running Case-2 steps.(By what?)k,-ti,ji+t,ji,i+t-1nSaving the book keeping efforts in all Case-3

    7、steps Case 1:長此以往Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nJust keep one current growing point throughout the execution.nDeriving the new position of the current growing point from its previous position(with the help of suffix links

    8、(斷頭指標)Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b The challenges:How do we derive the position of the current growing point?Vertically(case 2)Horizontally(case 3)nQ:Which one is easier?nMoving from iteration k 1 to iteration k.nThe gro

    9、wing point does not move!nThis is the easier case.Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b 1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1:長此以往Case 2:節外生枝Case 3:若無其事nMoving from Step i t

    10、o Step i+1 in the same iteration.nThe growing point moves dramatically.nThis is the tougher case.Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b 1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1:

    11、長此以往Case 2:節外生枝Case 3:若無其事n前人種樹後人涼的哲學n每次千辛萬苦找到vertical movement的目的時,把這個movement的起點與終點用一個link記錄下來.n下回遇到這個起點時,就可以直接走到終點去,不用再重新找一次.n這些link就叫做suffix link(斷頭指標).n終點所對應的字串,是起點所對應之字串的斷頭字串(second suffix)Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a

    12、b b n每個斷頭指標的起點一定是個internal node,不會葉子 不會是某個suffix tree edge的中間nWhy?Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b n每個internal node一定是某個斷頭指標的起點nWhy?Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a

    13、 a b a a b a b b S=b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,1121111 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b n遇到必須節外生枝(case 2),就在長好新樹枝之後,移到斷頭位置,繼續從那裡檢查目前這個Sk是case 2還是case 3.n遇到若無其事(case 3),就直接按照Sk的方向移動,繼續從那裡檢查下一個字元Sk+1是case 2還是case 3.11,13,

    14、1,12,3,127,2,34,7,14,17,3,34,1210,4,56,110,2,23,323453,3nGoing up to a closest internal node(whose suffix link must be available).Suppose this upward traversal passes through t characters.nFollowing the suffix link that starts from this internal node.ti,jnGoing down by matching the t-character subst

    15、ring Si,i+t 1 of S.ti,jnNavely:O(t).nCleverly:O(1+d),where d is the number of internal nodes being went through during phase(2).ti,jnSuppose di is the d in the i-th Case-2-step traversal.nIt suffices to show d1+d2+d|S|=O(|S|).Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a

    16、 a b b b a a b b a a b a a b a b b nThe slack means the distance between a position P and the closest internal node above P.ti,jnEach case-3 traversal(i.e.,horizontal movement)can only increase the value of by at most one.n(It can even decrease the value of.)Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b

    17、a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nThe i-th case-2 traversal(i.e.,vertical movement)decreases the value of by at least di.Case 2:節外生枝Case 3:若無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nInitial =O(1).n can be increased b

    18、y one for at most|S|times(because there are at most|S|horizontal movements(i.e.,case-3 traversals).nSince is always non-negative,the above bound is proved.nConstructing suffix tree in linear time.nApplications of suffix trees Substring problem(暖身)“Exact string matching”revisited Linearization of cir

    19、cular string(挪移乾坤)Longest common substring(異中求同)nIntermission 小巨s magic shownSubstring Problem(recap as a warm-up)nInput:two strings P and S,where S is allowed to be preprocessed in O(|S|)time.nOutput:an occurrence of P in S.nObjective:done in O(|P|)time.1,13,1,12,3,127,2,34,7,14,17,3,34,13,31,17,2,

    20、34,7,4,7,3,34,3,3Q:Where are abba,baa,bb?121,1347,2,34,57,4,67,3,34,3,33211Q:Where are abba,baa,bb?nExact String MatchingnInput:two strings P and S,where S is allowed to be preprocessed in O(|S|)time.nOutput:all occurrences of P in S.nChallenge:solving this in O(|P|+k)time,where k is the number of o

    21、ccurrences of P in S.nEach internal node keeps the labels of all its descendant leaves.121,1347,2,34,57,4,67,3,34,3,3Q:Somethings missing?6,35,24,15,2,4,1Q:How do we fix this problem?121,1347,2,34,57,4,67,3,34,3,35,24,15,2,4,1,8179,4,45,89,99,6,3,7Q:Obtainable in O(|S|)time?nS=a a a a a$1654321,2,3,

    22、4,51,2,3,41,2,31,2nConsider the sequence L of leaves from left to right.The descendant leaves of each internal node has to be consecutive in L.121,1347,2,34,57,4,67,3,33,35,24,15,2,4,1,879,4,45,89,99,6,3,71,34,84,56,7nCircular String Linearization(挪移乾坤)nLet挪(S,i)denote the string Si|S|S1i 1.Si挪(S,i)

    23、nInput a string S.nOutput an index i that maximizes the alphabetical order of 挪(S,i).1 2 3 4 5 6 7 8挪(S,1)=b b a b b a a b挪(S,2)=b a b b a a b b挪(S,3)=a b b a a b b b挪(S,4)=b b a a b b b a挪(S,5)=b a a b b b a b挪(S,6)=a a b b b a b b挪(S,7)=a b b b a b b a挪(S,8)=b b b a b b a alet j=1;for i=2 to|S|do

    24、if(挪(S,i)挪(S,j)let j=i;output j;Time complexity?1 2 3 4 5 6 7 8挪(S,1)=b b a b b a a b挪(S,2)=b a b b a a b b挪(S,3)=a b b a a b b b挪(S,4)=b b a a b b b a挪(S,5)=b a a b b b a b挪(S,6)=a a b b b a b b挪(S,7)=a b b b a b b a挪(S,8)=b b b a b b a a1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b

    25、a a b b a a b a a b a b b Q:How to fix the problem?nSuffix tree for SSnEach length-|S|substring of SS is a 挪(S,j)for some index j with 1 j|S|.nEach 挪(S,j)with 1 j|S|is a length-|S|substring of SS.11,13,1,12,3,127,2,34,7,14,17,3,34,1210,4,56,110,2,23,323453,31,17,4,7,4,7,3,310,4,56,10,2,23,33,3Q:How

    26、to use this suffix tree?nOutput the index i such that SSi|SS|corresponds to the rightmost leaf of the suffix tree for SS.Clearly,this takes O(|S|)time.nLongest common substring(異中求同)nInput:two strings A and B.nOutput:a longest string C that occurs in both A and B.nA=b b b a b b a a bnB=b a a b b a b

    27、 b a bnC=b bnC=b a a bnC=a b b anC=b b a b b abuild suffix tree for B;for L=|A|downto 1 do for i=1 to|A|-L+1 do if Aii+L-1 occurs in B output Aii+L-1 and halt;output“no common substring”;Time complexity?()3|12|1|1|)()1|(|AOAiiAOiOiAALALAL=+-=+-=nThe for-loop takes timeCan we do better than this?buil

    28、d suffix tree for B;for i=1 to|A|do find the largest integer L(i)such that Aii+L(i)-1 occurs in B by binary search;output AiL(i)for the i with the largest L(i);Time complexity?nThe for-loop takes O(|A|2 log|A|)time.Each binary search takes time O(|A|log|A|).There are overall O(|A|)binary searches.Ca

    29、n we do better than this?nit is impossible to solve this longest common substring problem in O(|A|+|B|)time.nin O(|A|+|B|)time via suffix treenConstruct a suffix tree T for A#B$,where#and$are two characters not in A and B.nThere are exactly|A|+|B|+2 leaves in T,each leaf corresponds to a suffix of A

    30、#B$.A-leaf:with label in 1,2,|A|ncorresponds to an A-suffix.B-leaf:with label in|A|+2,|A|+|B|+1ncorresponds to a B-suffix.$#ABA-suffixB-suffixnLet v be an arbitrary position of T(i.e.,v is not necessarily a node of T.)v has a descendant A-leaf if and only if v corresponds to a prefix of an A-suffix

    31、of A#B$.v has a descendant B-leaf if and only if v corresponds to a prefix of a B-suffix of A#B$.rootvnLet v be a position of T.v has descendant A-leaf and B-suffix if and only if v corresponds to a common substring of A and B.root$#ABA-suffixB-suffixvnDo we really need#to separate A and B in the co

    32、ncatenated string A#B$?root$ABA-suffixB-suffixvnConstruct the suffix tree T of A#B$.nOutput the string corresponding to a deepest internal node v such that the subtree of T rooted at v contains both A-leaf and B-leaf.nQ:why not checking leaves?nQ:why not checking positions of T that are not internal

    33、 nodes of T?nIf the position v contains both kinds of descendant leaves,then so does its closest internal node below.rootvnO(|A|+|B|)time for constructing T.nO(|A|+|B|)time for marking the colors of each node,including each leaf and each internal nodesnO(|A|+|B|)time for computing the depths of all

    34、nodesnO(|A|+|B|)time to find a deepest internal node with both colors.nQ:Can we further improve the time and space complexity?n“No”for the time complexity.n“Yes”for the space complexity.nInput:two strings A and B.nOutput:a longest string C that occurs in both A and B.nObjective:O(|A|+|B|)time O(|A|)

    35、spacenIdea:Construct the suffix tree of A only.nConstruct the suffix tree T of A,keeping all the suffix links.nFor i=1 to|B|do Find the largest integer 深(i)such that Bii+深(i)1 occurs in B.nOutput Bii+深(i)1 where i is the index with maximum 深(i).nFinding 深(i)for each i takes O(深(i)+1)time by traversi

    36、ng T from the root.nBut all these|B|iterations would take O(|A|B|)time in total.depth=深(i)n深(i+1)深(i)1.深(i)深(i)1What if this suffix link does not exist?121,1347,82,34,857,84,867,83,34,83,31Record:深(1)=31Record:深(3)=4Record:深(5)=6nClearly,the space complexity is O(|A|).nThe time complexity still O(|A

    37、|+|B|).We first show that the time is O(|A|+|B|)without considering that of suffix link traversal.We then show that the time for suffix link traversal is also O(|A|+|B|).nThe for-loop has exactly|B|iterations.nSuppose the i-th iteration moves the arrow to the right by d(i)units.nd(1)+d(2)+d(|B|)=|B|

    38、,because the arrow never goes left.nThe i-th iteration takes time O(d(i)+1).nSo,the overall time complexity is O(|B|).nLet(i)denote the distance between the position of at the end of the i-th iteration and the closest internal node above.nLet t(i)be the number of internal nodes touched in the downward suffix link traversal of the i-th iteration.n(i)(i 1)+d(i)t(i).d(i)t(i)nt(i)(i 1)+d(i)(i)nTherefore,t(1)+t(2)+t(|B|)is at most d(1)+d(2)+d(|B|)+(0)(|B|),which is clearly O(|B|+|A|).(0)|A|,and d(1)+d(2)+d(|B|)=|B|.

    展开阅读全文
    提示  163文库所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
    关于本文
    本文标题:生物资讯相关演算法AlgorithmsinBioinformatics-课件.ppt
    链接地址:https://www.163wenku.com/p-5205248.html

    Copyright@ 2017-2037 Www.163WenKu.Com  网站版权所有  |  资源地图   
    IPC备案号:蜀ICP备2021032737号  | 川公网安备 51099002000191号


    侵权投诉QQ:3464097650  资料上传QQ:3464097650
       


    【声明】本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是网络空间服务平台,本站所有原创文档下载所得归上传人所有,如您发现上传作品侵犯了您的版权,请立刻联系我们并提供证据,我们将在3个工作日内予以改正。

    163文库