《生命科学》 2025, 37(12): 1605-1614
大语言模型在合成生物学中的应用进展与挑战
摘 要:
合成生物学作为基于工程化理念的生命系统工程设计学科,其核心挑战在于基于工程化原理,“由下至上”建立由基因元件组装而成的各种生物功能模块和系统,并建立序列- 功能映射的预测模型。大语言模型(large language models, LLMs) 凭借其自监督预训练机制与注意力架构优势,通过解析DNA/RNA 序列中的语法规则与语义特征,在从基因元件到基因组系统的多个微观层次,为生物序列的跨尺度建模提供了新工具。本综述聚焦LLMs 如何辅助解决合成生物学中的关键设计难题,系统地综述了LLMs 在合成生物学中的创新应用,系统梳理了其在基因元件、基因线路、基因簇重构、基因组等多个层次的研究进展,并探讨其如何与传统模型及工程化方法结合,共同提升面向模块化生命系统构建的理性设计能力,并分析当前面临的挑战与未来发展方向。
通讯作者:娄春波 , Email:cb.lou@siat.ac.cn
Abstract:
Synthetic biology, as an engineering discipline for designing life systems based on engineering principles, faces the core challenge of constructing various biological functional modules and systems "from the bottom up" through the assembly of genetic components and establishing predictive models for sequence-function mapping. Large language models (LLMs), leveraging their self-supervised pre-training and attention mechanisms, have provided new tools for cross-scale modeling of biological sequences by deciphering grammatical rules and semantic features in DNA/RNA sequences across multiple microscopic levels—from genetic components to genomic systems. This review focuses on how LLMs assist in addressing key design challenges in synthetic biology. It systematically summarizes the innovative applications of LLMs in synthetic biology, detailing their research progress at various levels, including genetic components, genetic circuits, gene cluster reconstruction, and phage genomes. Furthermore, it explores how LLMs integrate with traditional models and engineering approaches to collectively advance rational design capabilities for modular life system construction, highlighting current challenges and future directions.
Communication Author:LOU Chun-Bo , Email:cb.lou@siat.ac.cn