上海大学学报(自然科学版)

• • 上一篇    下一篇

基于支持向量机的拉氏图优化方法在蛋白质结构标注中的应用

王博1,苏天昊2,徐妍婷1,高恒1,郭聪1,李永乐1,吴伟1   

  1. 1. 上海大学 理学院 量子与分子结构国际中心,上海 200444; 2. 上海大学 材料基因组工程研究院, 上海 200444
  • 收稿日期:2022-08-09 修回日期:2022-12-22 接受日期:2023-02-17 出版日期:2023-04-26 发布日期:2023-04-26
  • 通讯作者: 李永乐(1983-),男,副教授,博士,博士生导师,研究方向为量子与分子动力学 E-mail:yongleli@shu.edu.cn
  • 基金资助:
    上海市科学技术委员会创新项目“机器学习辅助的第一性原理强关联计算方法开发”(21JC1402700);上海市“科技创新行动计划”启明星项目扬帆专项(22YF1413300)

An optimization method based on support vector machine for Ramachandran plot in protein structures annotation

Wang Bo1, Su Tianhao2, Xu Yanting1, Gao Heng1, Guo Cong1, Li Yongle1, Wu Wei1   

  1. 1. International Centre for Quantum and Molecular Structures, College of Sciences, Shanghai University, Shanghai 200444, China; 2. Materials Genome Institute, Shanghai University, Shanghai 200444, China
  • Received:2022-08-09 Revised:2022-12-22 Accepted:2023-02-17 Online:2023-04-26 Published:2023-04-26

摘要: 拉氏图是一种经典的蛋白质结构验证工具,在蛋白质结构研究领域有广泛应用。然而,传统拉氏图定义的合理区域范围太广,容错率高,包含了一些不准确的结构。针对这一问题,本文提出一种基于支持向量机和贝叶斯优化的方法SVM-Rama,对传统拉氏图的合理区域定义进行优化和细分,使细分后的合理区域的范围精确到具体的二级结构种类,SVM-Rama可以提高蛋白质结构验证准确率,并简便精确地标注二级结构。结果表明,本方法在二级结构标记中,准确率接近传统方法取得的最好结果,但训练和计算成本远小于传统方法。

关键词: 拉氏图, 支持向量机, 蛋白质结构标记

Abstract: The Ramachandran plot is among the most central concept for validating the conformation of protein structures, which plays an important role in structural biology. However, the favored regions defined by the traditional Ramachandran plot are too wide and contain inaccurate structures. For this lack, a method based on Support Vector Machine and Bayesian Optimization, SVM-Rama, is proposed to optimize and subdivide the definition of favored regions for the Ramachandran plot. The present study aims to improve the accuracy of the favored regions to specific secondary structure species of proteins and then to validate and annotate protein secondary structures simply and accurately. The results show that it has a high accuracy close to the best performance of traditional methods in secondary structure annotation but at lower training and computational costs than traditional methods do.

Key words: Ramachandran plot, Support vector machine, structure annotation of proteins

中图分类号: