Journal of Shanghai University(Natural Science Edition) ›› 2016, Vol. 22 ›› Issue (1): 45-57.doi: 10.3969/j.issn.1007-2861.2015.04.018

Previous Articles     Next Articles

Survey of clustering methods for big data in biology

LU Dongfang, XU Junfu, XIANG Chaojuan, XIE Jiang   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
  • Received:2015-11-30 Online:2016-02-29 Published:2016-02-29

Abstract:

With the implementation of the Human Genome Project and the rapid development of biological experiment technology, biological data sharply grow and continuous accumulate. Age of big data in biology is coming. In the post genomic era, single statistical models are gradually replaced with combination of intelligent and comprehensive analyses. Clustering is the core of data mining. This paper describes the state-of-the-art technology of big data in bioinformatics, and summarizes several popular clustering methods on gene expression profiling and biological networks. Furthermore, some experiments are made to compare different clustering methods on the time series data of mouse embryonic fibroblasts, showing that different clustering methods have different results. To achieve more reliable conclusions for highly noisy biological data, it is necessary for investigators to do comprehensive analyses by selecting and combining proper clustering methods.

Key words: big data in biology, clustering method, data analysis