生物医学大数据生产要素价值的实现:从数据元素起步

张国庆1 , 赵国屏1 , 李亦学1,2,*
1中国科学院上海营养与健康研究所,生物医学大数据中心,上海 200031 2广州实验室,广州 510005

摘 要:

    基因组学/ 系统生物医学、转化医学、精准医学时代以来形成的生物医学大数据不仅是生物医学领域开展数据密集型研究的基石,成为与人口健康、社会发展和国家安全相关的战略资源,而且还是利用人工智能赋能“大健康”产业发展的核心生产要素( 常简称为“数据要素”)。生物医学数据元素具有与生物和医学相关的“跨尺度、多源性、高维度、细粒度”等异质性复杂体系特征,因此,具有4V 特征(Volume、Velocity、Variety、Veracity) 的海量生物医学数据的数据元素必须经标准化规范整合并供共享分析,才能将海量生物医学数据质变转化为生物医学大数据,发挥生产要素的功能,实现生产要素的价值。这个价值释放的“要素化”过程,面临着特有的机遇与挑战,特别是已经成为生物学与健康医疗大数据最核心的基础的多组学及多模态数据,与欧美相比,我国数据“多而不强”,由于开放共享程度低、集中程度不高,难以评估数据质量。数据库是生物医学数据共享的主要载体,其数据来源和共享模式直接影响数据要素的价值释放过程。数据中心是数据库的建设及运维主体,也是各类数据元素转换为适用各类应用场景的数据要素的重要参与者和推动者,处于数据要素化不可或缺的核心环节。在从数据元素转换到数据要素的过程中,我们面临着存量数据规模与数据规范化集成的治理能力不匹配、已开放的数据规模与数据分析挖掘的治理能力不匹配的挑战,需要在数据、数据库、数据中心三个层面上加强数据治理和数据共享等基础性工作。我们建设了1( 套 整合交互共享导向的数据资源服务体系)-2 ( 个 标准化数据分析平台)-3 ( 种 科学/ 技术问题驱动的健康医学数据治理平台)-X ( 类 面向应用场景的智能分析服务体系) 的生物医学大数据技术体系,秉承“安全管理、信息共享、标准增值、技术创新、尊重产权、高效利用”理念,努力将数据中心从成本中心转换为价值中心,可为生物医学大数据“要素化”提供借鉴。

通讯作者:李亦学 , Email:yxli@sibs.ac.cn

Value realization of biomedicine big data as the production factor: starting from data elements
ZHANG Guo-Qing1 , ZHAO Guo-Ping1 , LI Yi-Xue1,2,*
1Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China 2Guangzhou Laboratory, Guangzhou 510005, China

Abstract:

Biomedical big data (BMBD), which has emerged since the inception of genomics, systems biology, translational medicine, and precision medicine, not only forms the foundation of data-intensive research in the biomedical field, but also serves as a strategic asset with far-reaching implications for public health, societal progress, and national security. Moreover, it plays a pivotal role in driving the expansion of the "big health" industry through the application of artificial intelligence. Biomedical data exhibit heterogeneous complexity characterized by their relevance to biology and medicine across different scales, multiple sources, high dimensions, and fine granularity. Therefore, massive biomedical data with the 4V characteristics (Volume, Velocity, Variety, Veracity) must undergo standardization, specification, integration, and sharing for collaborative analysis. This process is essential to transform the sheer volume of biomedical data into biomedical big data, enabling these data to function as essential production factors and realize their intrinsic value in the realm of healthcare and medical research. The process of transforming massive data into BMBD with the value as production factors presents distinctive opportunities and challenges, especially in the context of multi-omics and multimodal data, which now serve as the foundation of biology and healthcare big data. In comparison to Europe and the United States, China's data landscape is characterized as “abundant but not robust,” marked by low levels of openness and lack of integration, which makes interactive data search and easy data assess a challenging task. Databases play a pivotal role as the primary conduits for sharing biomedical data, and their data sources and sharing models directly influence the process of extracting value from data elements. Data centers are responsible for both constructing and maintaining databases, playing a pivotal role as significant contributors and advocates in the transformation of diverse data elements into adaptable key production factors suitable for various application scenarios. As we strive to convert data elements into production factors, we face challenges arising from disparities between governance capacity and the scale of existing data and data standardization / standardized data collection, as well as a mismatch between governance capacity and the scale of open data and data analysis and mining. It is imperative to strengthen the foundational aspects of data governance and data sharing across three levels: data, databases, and data centers. We have implemented a 1 (set of integrated interactive sharing guided data resource system)-2 (sets of standardized data analysis platform)-3 (sets of science and technology driven governance platform for health medicine big data)-X (sets of application scenario based knowledge mining AI analysis systems) BMBD technological engineering system, guided by the principles of “safety and security assured management, information sharing, value-added standardization, technological innovation, IP respected, and efficient utilization”. Our ongoing efforts are directed towards transitioning the data center from its currently cost-burdened status to a value-added level, poised to offerinvaluable insights for unlocking the value of BMBD.

Communication Author:LI Yi-Xue , Email:yxli@sibs.ac.cn

Back to top