NBT:宏基因组二、三代混合组装软件OPERA-MS

文章目录

  • 宏基因组二、三代测序混合组装软件OPERA-MS
    • 热心肠日报
    • 摘要
    • 主要结果
      • 图1. OPERA-MS工作流程图
      • 图2. 宏基因组数据混合组装基因组评测
      • 图3. 组装虚拟肠道微生物组
      • 图4. 移动元件和与人肠道微生物组中宿主物种的关联
    • 总结
    • Reference
    • 相关阅读
    • 猜你喜欢
    • 写在后面

image

首先将宏基因组的短读长拼接为重叠群,并将短读取和长读长比对至重叠群以获得覆盖信息和跨越序列(步骤1)。然后绑定跨越读长获得组装图中重叠群之间的边,该组装图表示整个宏基因组的连续性信息(步骤2)。将重叠群组织成层次聚类,其中重叠群之间的距离随基因组距离及其覆盖差异而增加(步骤3)。然后基于BIC(贝叶斯信息准则)将树切割成最佳簇(步骤4)。可选步骤,为了改善可获得参考基因组物种的聚类,计算每个聚类与完整细菌基因组数据库之间的Mash基因组距离(步骤5)。然后,如果在装配图中存在支持信息以形成物种特定的超级簇,则合并簇(步骤6)。进一步分析这些超级簇以解卷积来自可区分的亚种基因组的重叠群(步骤7)。最后,使用针对分离基因组的程序(OPERA-LG;步骤8),独立地构建每个簇并填充间隙。

Short reads are first assembled by a metagenomic assembler into contigs, and short and long reads are mapped to them to obtain coverage information and spanning reads (Step 1). Spanning reads are then bundled to get edges between contigs for an assembly graph that represents the contiguity information of the whole metagenome (Step 2). Contigs are organized into a hierarchical clustering where the distance between contigs increases with genomic distance and their difference in coverage (Step 3). The tree is then cut into optimal clusters based on the BIC (Step 4). Optionally, to improve the clustering for species where a reference genome is available, the Mash genomic distance between each cluster and a database of complete bacterial genomes is computed (Step 5). Clusters are then merged if there is supporting information in the assembly graph to form species-specific super-clusters (Step 6). These super-clusters are further analyzed to deconvolute contigs that come from distinguishable subspecies genomes (Step 7). Finally, each cluster is independently scaffolded and gap-filled using a program meant for isolate genomes (OPERA-LG; Step 8).

图2. 宏基因组数据混合组装基因组评测

Fig. 2: Benchmarking hybrid assembly of genomes from metagenomes.

image

a,构建虚拟肠道微生物组,代表复杂的宏基因组数据集,同时保留评估组装与金标准参考的能力。

b,与不同覆盖范围内的其他组装软件相比,使用OPERA-MS获得组装连续性(NGA50)的改进情况。点代表在宏基因组中具有至少两个菌株的物种(在GIS20和S2中存在的物种,如MetaPhlAn2报道的丰度 > 0.1%(参考文献49)(v.2.6.0))。按照覆盖度的上升,组装的基因组的数量对于Canu是1,对于其他方法是2,6,4和5个。数据以箱形图表示(中心线,中位数;箱限,上下四分位数; 须线,1.5×四分位数间距; 点,异常值)。

c,不同组装软件的组装错误率(每个基因组一个点)的比较,实线表示中值。

d,在分箱后评估仅Illumina数据(M,MEGAHIT)和混合(H,hybridSPAdes; O,OPERA-MS)组装宏基因组组装以用于下游分析。包含最大部分参考基因组的区域(GIS20参考文献;具有粗体名称的物种在宏基因组中具有至少两个菌株)评估以下参数:(1)基因组完整性,在分箱中基因组的比例,(2)基因组纯度,分箱中碱基对应正确参考的百分比,(3)基因完整性,在分箱中完全组装的基因比例和(4)通路完整性,其组成基因超过90%的通路出现在组装的分箱中。

a, Construction of a virtual gut microbiome that represents a complex metagenomic data set while retaining the ability to evaluate assemblies against gold-standard references. b, Improvement in assembly contiguity (NGA50) obtained using OPERA-MS compared with other assemblers over different coverage ranges. Dots represent species that have at least two strains in the metagenome (species present in GIS20 and S2 with an abundance >0.1% as reported by MetaPhlAn2 (ref. 49) (v.2.6.0)). The number of assembled genomes, in ascending order of coverage, was 1 for Canu and 2, 6, 4 and 5 for the other methods. Data are presented as box plots (center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers). c, Comparison of misassembly rates (one dot per genome) for different assemblers, with solid lines indicating median values. d, Evaluation of Illumina-only (M, MEGAHIT) and hybrid (H, hybridSPAdes; O, OPERA-MS) metagenomic assemblies after binning for their utility in downstream analysis. Bins that contained the largest fraction of a reference genome (GIS20 references; species with bold names have at least two strains in the metagenome) were evaluated for (1) genome completeness, the fraction of the genome represented in the bin, (2) genome purity, percentage of bases in the bin that correspond to the correct reference, (3) gene completeness, fraction of genes that were fully assembled in the bin and (4) pathway completeness, fraction of pathways with over 90% of their constituent genes being assembled and binned together.

图4. 移动元件和与人肠道微生物组中宿主物种的关联

Fig. 4: Mobile elements and association with host species in the human gut microbiome.

image

学习扩增子、宏基因组科研思路和分析实战,关注“宏基因组”

image

点击阅读原文,跳转最新文章目录阅读
https://mp.weixin.qq.com/s/5jQspEvH5_4Xmart22gjMA

来源:刘永鑫Adam

声明:本站部分文章及图片转载于互联网,内容版权归原作者所有,如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢!

上一篇 2019年7月2日
下一篇 2019年7月2日

相关推荐