基于支持向量机的拉氏图优化方法在蛋白质结构标注中的应用

展开
  • 1. 上海大学 理学院 量子与分子结构国际中心,上海 200444; 2. 上海大学 材料基因组工程研究院, 上海 200444

收稿日期: 2022-08-09

  修回日期: 2022-12-22

  录用日期: 2023-02-17

  网络出版日期: 2023-02-17

基金资助

上海市科学技术委员会创新项目“机器学习辅助的第一性原理强关联计算方法开发”(21JC1402700);上海市“科技创新行动计划”启明星项目扬帆专项(22YF1413300)

An optimization method based on support vector machine for Ramachandran plot in protein structures annotation

Expand
  • 1. International Centre for Quantum and Molecular Structures, College of Sciences, Shanghai University, Shanghai 200444, China; 2. Materials Genome Institute, Shanghai University, Shanghai 200444, China

Received date: 2022-08-09

  Revised date: 2022-12-22

  Accepted date: 2023-02-17

  Online published: 2023-02-17

摘要

拉氏图是一种经典的蛋白质结构验证工具,在蛋白质结构研究领域有广泛应用。然而,传统拉氏图定义的合理区域范围太广,容错率高,包含了一些不准确的结构。针对这一问题,本文提出一种基于支持向量机和贝叶斯优化的方法SVM-Rama,对传统拉氏图的合理区域定义进行优化和细分,使细分后的合理区域的范围精确到具体的二级结构种类,SVM-Rama可以提高蛋白质结构验证准确率,并简便精确地标注二级结构。结果表明,本方法在二级结构标记中,准确率接近传统方法取得的最好结果,但训练和计算成本远小于传统方法。

本文引用格式

王博, 苏天昊, 徐妍婷, 高恒, 郭聪, 李永乐, 吴伟 . 基于支持向量机的拉氏图优化方法在蛋白质结构标注中的应用[J]. 上海大学学报(自然科学版), 0 : 1 . DOI: 10.12066/j.issn.1007-2861.2462

Abstract

The Ramachandran plot is among the most central concept for validating the conformation of protein structures, which plays an important role in structural biology. However, the favored regions defined by the traditional Ramachandran plot are too wide and contain inaccurate structures. For this lack, a method based on Support Vector Machine and Bayesian Optimization, SVM-Rama, is proposed to optimize and subdivide the definition of favored regions for the Ramachandran plot. The present study aims to improve the accuracy of the favored regions to specific secondary structure species of proteins and then to validate and annotate protein secondary structures simply and accurately. The results show that it has a high accuracy close to the best performance of traditional methods in secondary structure annotation but at lower training and computational costs than traditional methods do.
Options
文章导航

/