《生命科学》 2025, 37(12): 1634-1644
大模型驱动的多组学队列数据整合与疾病预测
摘 要:
队列研究作为现代医学研究的基础,在疾病的机理探寻及健康的风险预测中发挥着重要作用。在高通量技术飞速发展的背景下,对于诸多复杂疾病而言,多组学队列研究是现阶段揭示其发生发展机制最有效的研究方法之一。随着大规模语言模型(LLM) 和多模态大模型的快速兴起,处理队列研究中的复杂数据有了新的可能,如在时间序列建模、缺失数据处理和跨模态数据整合和分析方面都可由该类模型实现。本综述对UK Biobank、All of Us 和中国慢性病前瞻性研究三大典型队列中大模型应用的关键案例进行了全面总结,将队列研究从“相关性发现”阶段推向“因果推断”阶段。本文梳理了前沿的方法学创新、典型的使用场景以及可能遇到的挑战与应对方案,为大模型驱动的多组学队列研究提供了一个系统性框架,探讨了目前存在的主要问题以及后续发展的路径,对推动精准医学和公共卫生等领域的科学决策具有重要意义。
通讯作者:方金武 , Email:fangjinwu007@126.com
Abstract:
Cohort studies serve as a cornerstone of modern medical research, enabling mechanistic insights into disease etiology and risk prediction. Advances in high-throughput technologies have positioned multi-omics cohort studies as indispensable tools for elucidating the pathogenesis of complex diseases. The emergence of large language models (LLMs) and multimodal architectures now unlocks novel capabilities for handling cohort data complexities, including longitudinal trajectory modeling, imputation of missing entries, and cross-modal integration. This review systematically evaluates pivotal applications of large models in three landmark cohorts—the UK Biobank, All of Us, and China Kadoorie Biobank (CKB)—demonstrating their transformative impact in transitioning cohort research from correlative analysis to causal inference. We present a methodological framework integrating cutting-edge innovations, practical implementation scenarios, and solutions to technical challenges. Our analysis highlights the potential of AI-driven cohort studies to revolutionize precision medicine and public health decision-making through mechanistic interpretability and actionable biomarker discovery.
Communication Author:FANG Jin-Wu , Email:fangjinwu007@126.com