大语言模型在合成生物学中的应用进展与挑战

李璐炜1,2 , 王子陌1,3 , 娄春波1,*
1中国科学院深圳先进技术研究院,深圳合成生物学创新研究院,定量合成生物学全国重点实验室,细胞与基因线 路设计中心,深圳 518000 2南方科技大学工学院,深圳 518055 3宁波诺丁汉大学理工学院,宁波 315000

摘 要:

合成生物学作为基于工程化理念的生命系统工程设计学科,其核心挑战在于基于工程化原理,“由下至上”建立由基因元件组装而成的各种生物功能模块和系统,并建立序列- 功能映射的预测模型。大语言模型(large language models, LLMs) 凭借其自监督预训练机制与注意力架构优势,通过解析DNA/RNA 序列中的语法规则与语义特征,在从基因元件到基因组系统的多个微观层次,为生物序列的跨尺度建模提供了新工具。本综述聚焦LLMs 如何辅助解决合成生物学中的关键设计难题,系统地综述了LLMs 在合成生物学中的创新应用,系统梳理了其在基因元件、基因线路、基因簇重构、基因组等多个层次的研究进展,并探讨其如何与传统模型及工程化方法结合,共同提升面向模块化生命系统构建的理性设计能力,并分析当前面临的挑战与未来发展方向。

通讯作者:娄春波 , Email:cb.lou@siat.ac.cn

Advances and challenges in the application of large language models in synthetic biology
LI Lu-Wei1,2 , WANG Zi-Mo1,3 , LOU Chun-Bo1,*
1Center for Cell and Genetic Circuit Design, State Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China 2Faculty of Engineering, Southern University of Science and Technology, Shenzhen 518055, China 3Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo 315000, China

Abstract:

Synthetic biology, as an engineering discipline for designing life systems based on engineering principles, faces the core challenge of constructing various biological functional modules and systems "from the bottom up" through the assembly of genetic components and establishing predictive models for sequence-function mapping. Large language models (LLMs), leveraging their self-supervised pre-training and attention mechanisms, have provided new tools for cross-scale modeling of biological sequences by deciphering grammatical rules and semantic features in DNA/RNA sequences across multiple microscopic levels—from genetic components to genomic systems. This review focuses on how LLMs assist in addressing key design challenges in synthetic biology. It systematically summarizes the innovative applications of LLMs in synthetic biology, detailing their research progress at various levels, including genetic components, genetic circuits, gene cluster reconstruction, and phage genomes. Furthermore, it explores how LLMs integrate with traditional models and engineering approaches to collectively advance rational design capabilities for modular life system construction, highlighting current challenges and future directions.

Communication Author:LOU Chun-Bo , Email:cb.lou@siat.ac.cn

Back to top