摘要
Thelargeamountofrepeats,especiallyhighcopyrepeats,inthegenomesofhigheranimalsandplantsmakeswholegenomeassembly(WGA)quitedifficult.Inordertosolvethisproblem,wetriedtoidentifyrepeatsandmaskthempriortoassemblyevenatthestageofgenomesurvey.Itisknownthatrepeatsofdifferentcopynumberhavedifferentprobabilitiesofappearanceinshotgundata,sobasedonthisprinciple,weconstructedastatisticalmodelandinferredcriteriaformathematicallydefinedrepeats(MDRs)atdifferentshotguncoverages.Accordingtothesecriteria,wedevelopedsoftwareMDRmaskertoidentifyandmaskMDRsinshotgundata.Withrepeatsmaskedpriortoassembly,thespeedofassemblywasincreasedwithlowererrorprobability.Inaddition,clone-insertsizeaffectstheaccuracyofrepeatassemblyandscaffoldconstruction.Wealsodesignedlengthdistributionofclone-insertsusingourmodel.Inoursimulatedgenomesofhumanandrice,thelengthdistributionofrepeatsisdifferent,sotheiroptimallengthdistributionsofclone-insertswerenotthesame.Thuswithoptimallengthdistributionofclone-inserts,agivengenomecouldbeassembledbetteratlowercoverage.
出版日期
2003年01月11日(中国期刊网平台首次上网日期,不代表论文的发表时间)