上海大学学报(自然科学版)

• 生命科学 • 上一篇    下一篇

基因和蛋白质的批量注释系统UBROAD

郭景康;张祥云;杨旭智   

  1. 1.上海大学 生命科学学院,上海 200444; 2.上海交通大学 附属瑞金医院 上海血液学研 究所,上海 200025
  • 收稿日期:2006-04-25 出版日期:2007-02-28 发布日期:2007-02-28

UBROAD: A Web Based Platform of Gene and Protein Data

GUO Jing kang;ZHANG Xiang yun;YANG Xu zhi   

  1. 1. School of Life Sciences, Shanghai University, Shanghai 200444, China; 2. Shanghai Institute of Hematology, Ruijin Hospital, Shanghai Jiaotong Universi ty, Shanghai 200025, China
  • Received:2006-04-25 Online:2007-02-28 Published:2007-02-28

摘要: 以DNA微阵列、二维电泳、二维高压液相色谱和质谱等技术为代表的转录组和蛋白质组高通 量实验技术,能够产生海量的基因或蛋白质数据,对这些基因或蛋白质的注释是对相关数据 进 行后期处理的基础和必要条件.海量数据的注释人工难以完成,而目前基因和蛋白质的批量 注释网站给出的注释又往往不够全面.在比较常用基因和蛋白质的批量注释网站的基础上, 本工作研发了基因和蛋白质的批量注释系统UBROAD(Unified Batch Retriever of Annotati on Data),该系统整合了NCBI、Swiss Prot、BIND、enzyme -expasy、gene2accession 和gene2Unig ene 6个有关基因和蛋白质的数据源;支持Uniprot/trEMBL AC、 Uniprot Entryname、Genb ank Protein Accession Number、Genbank mRNA gi、Genbank mRNA Accession Number、 G ene name、 Gene ID和Unigene ID 8种登录号混合查询;含有各种登录号以及基 因或蛋白质的基本信息、功能分类、相互作用共38项注释项供选择;提供微软电子表格形式 的注释结果.可以通过访问网页http://www.bioscience.org.cn/ubroad免费使用该系统.

关键词: 批量注释, 生物信息学, 数据库整合

Abstract: High throughput experiments like microarray or protein mass spectromet ry can produce huge amount of data efficiently. These data, after some basic pre processing, are often presented in the form of a list of gene/protein identifier s accompanied by (semi) quantitative experimental data such as expression profil es. It is necessary to obtain detailed annotation information about these genes/ proteins to further analyze these data and to extract biological meanings. We de veloped a web based platform UBROAD (Unified Batch Retriever of Annotation Data ) to facilitate quick and efficient retrieval of annotation data for genes and pr oteins from various sources. UBROAD integrates several biological data sources i ncluding NCBI, Swiss Prot, BIND, Enzyme, genetounigene and genetoaccession, and supports mixed searches of eight types of identifiers including Uniprot/trEMBL A C, Uniprot Entryname, Genbank Protein Accession Number, Genbank mRNA gi, Genbank mRNA Accession Number, Gene name, Gene ID and Unigene ID. The output file inclu des 38 annotation items downloadable in the form of Microsoft Excel. UBROAD is f reely available at http://www.bioscience.org.cn/ubroad.

Key words: batched annotation, bioinformatics, database integrated