您现在所在的位置: 首页» 科学研究» 科研动态

廖明帜等《Nature Communications》2026年

       发布日期:2026-05-15   浏览次数:

论文题目:Large-scale data-driven pre-trained DNA models enhance performance across diverse genomics tasks

论文作者:Canzhuang Sun, Zhijie He, Shifei Zhang, Kang Xu, Yu Sun, Yuyang Wang, Pengzhen Hu, Xiaochen Bo, Mingzhi Liao, Hao Li & Hebing Chen

论文摘要:Sequence-based deep learning has advanced genome interpretation, yet most models remain task-specific and rely on retraining, limiting scalability across biological contexts. Here we present SUCCEED, a supervised multi-task DNA foundation model pretrained on 6,389 ENCODE functional genomics tracks to learn transferable regulatory representations. By integrating convolutional layers with a Transformer architecture, SUCCEED captures both local sequence motifs and long-range regulatory dependencies, achieving performance comparable to or exceeding Enformer across benchmark tasks. Through transfer learning, it predicts cell-type-specific epigenomic profiles, denoises sparse chromatin accessibility signals, and predicts three-dimensional chromatin contacts without CTCF input across data scales and cell types. Across diverse genomics tasks, SUCCEED performs comparably to supervised foundation models such as Sei and outperforms self-supervised models trained solely on DNA sequence. Overall, SUCCEED is a transferable and scalable foundation model that provides a unified framework for genome-scale regulatory modeling in complex biological contexts.

基于序列的深度学习推动了基因组解析研究的发展,但现有多数模型仍局限于特定任务,且依赖重新训练,难以在不同生物学场景中实现规模化应用。本研究提出SUCCEED一种有监督多任务DNA基础模型,该模型基于6389条ENCODE功能基因组学轨迹数据进行预训练,可学习具备迁移性的调控表征。SUCCEED将卷积层与Transformer架构相结合,既能捕获局部序列基序,又能捕捉远程调控依赖关系,在各类基准任务上性能与Enformer持平甚至更优。借助迁移学习,该模型可跨数据规模和细胞类型,实现细胞类型特异性表观基因组图谱预测、稀疏染色质开放信号去噪,以及无需CTCF输入的三维染色质互作预测。在各类基因组学任务中,SUCCEED性能与Sei等有监督基础模型相当,且优于仅基于DNA序列训练的自监督模型。

综上,SUCCEED是一款具备可迁移性与可扩展性的基础模型,为复杂生物学场景下的全基因组尺度调控建模提供了统一研究框架。

论文链接:https://doi.org/10.1038/s41467-026-73129-6