摘 要:摘 要:蛋白质相互作用在生物学过程和细胞功能行使中起核心作用。高通量技术的应用结合计算机预测方法的发展,使得直接和间接来源的蛋白质相互作用数据得到了大规模的增加。如何系统地整合这些数据并从中提取有用的信息是一项挑战,这也促使了许多整合算法应运而生。本文综述了八种整合蛋白质相互作用数据源的方法: 投票、支持向量机、朴素贝叶斯、逻辑斯蒂回归、决策树、随机森林、基于随机森林的k-近邻法以及混合属性分类等方法。
关键词:蛋白质相互作用;数据整合;二分类器
中图分类号:Q51;Q811.4;Q816 文献标识码:A
Abstract: Abstract: Protein-protein interactions are crucial for all biological processes and fundamental to virtually every aspect of cellular functions. Developments of high through-put experimental techniques and in silico prediction methods help to increase direct and indirect protein-protein interactions data. How to systematically integrate those data and extract the meaningful information from them is a really challenge. Many computational approaches are therefore emerging for the purposes. This review presents recent advances for the application of those approaches in integrating protein-protein interaction data sources, including voting, support vector machine, naive bayes, logistic regression, decision tree, random forest (RF), RF-based k-nearest-neighbor and mixture of feature experts.
Key words: protein-protein interaction; data integration; binary classifier