精确医学与大数据

郭毅可1,2, 杨氙2

doi:10.3969/j.issn.1007-2861.2015.05.015

上海大学学报(自然科学版) >

2016 , Vol. 22 >Issue 1: 17 - 27

DOI: https://doi.org/10.3969/j.issn.1007-2861.2015.05.015

大数据

精确医学与大数据

展开

1. 上海大学计算机工程与科学学院, 上海 200444; 2. 伦敦帝国理工学院数据科学研究所, 伦敦 SW7 2AZ

收稿日期: 2016-01-12

网络出版日期: 2016-02-29

收起

Precision medicine and big data

Expand

1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China; 2. Data Science Institute, Imperial College London, London SW7 2AZ, UK

Received date: 2016-01-12

Online published: 2016-02-29

Fold

摘要

为了实现精确医学, 需要采集和分析大量数据来量化每个病人. 首先讨论了从分子层面到链路层面的数据, 同时阐述了使用医疗图像数据的必要性. 不同数据类型虽然需要有不同的预处理方式, 但是在预处理完成后, 通常可以使用通用的方法对这些数据进行分析, 如分类和网络分析. 从研究问题的角度讨论了多种分别用于解答不同复杂度问题的研究方法. 这些由简单到复杂的问题包括关联性检测、归类分析、构建分类器、获得网络连接和动态模型构建.

关键词： 大数据; 分析方法; 精确医学

本文引用格式

郭毅可1,2, 杨氙2 . 精确医学与大数据[J]. 上海大学学报(自然科学版), 2016 , 22(1) : 17 -27 . DOI: 10.3969/j.issn.1007-2861.2015.05.015

Abstract

To achieve precision medicine, collecting and analysing various big data are needed to quantify individual patients. This paper first discusses the need of using data from molecular level to pathway level and also incorporating medical imaging data. Different preprocessing methods should be developed for different data type, while some postprocessing steps for various data types, such as classification and network analysis, can be done by a generalized approach. From the perspective of research questions, this paper then studies methods for answering five typical questions from simple to complex. These
questions are detecting associations, identifying groups, constructing classifiers, deriving connectivity and building dynamic models.

Key words： analysis methods; big data; precision medicine

参考文献

[1] Winslow R L, Trayanova N, Geman D, et al. Computational medicine: translating models to clinical care [J]. Sci Transl Med, 2012, 4(158): 158rv11.
[2] Coveney P, D´?az-Zuccarini V, Hunter P, et al. Computational biomedicine [C]//Computational Biomedicine. 2014: 296.
[3] Wolkenhauer O. Why model? [J]. Front Physiol, 2014, 5: 1-5.
[4] Pearson K. Note on regression and inheritance in the case of two parents [J]. Proc R Soc London, 2006, 58(1): 240-242.
[5] Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of maxdependency [C]//IEEE Trans Pattern Anal. 2005: 1226-1238.
[6] Reshef D N, Reshef Y A, Finucane H K, et al. Detecting novel associations in large data sets [J]. Science, 2011, 334(6062): 1518-1524.
[7] Freedman D. Statistical models: theory and practice [M]. Cambridge: Cambridge University Press, 2005.
[8] Tibshirani R. Regression selection and shrinkage via the Lasso [J]. Journal of the Royal Statistical Society B, 1994, 58: 267-288.

[9] Chen S S, Donoho D L, Saunders M A. Atomic decomposition by basis pursuit [J]. SIAM Journal on Scientific Computing, 1998, 20(1): 33-61.
[10] Becker S R, Cand`es E J, Grant M C. Templates for convex cone problems with applications to sparse signal recovery [J]. Math Program Comput, 2011, 3(3): 165-218.
[11] Boyd S. Distributed optimization and statistical learning via the alternating direction method of multipliers [J]. Found Trends Mach Learn, 2010, 3(1): 1-122.
[12] Becker S, Bobin J, Cand`es E J. NESTA: a fast and accurate first-order method for sparse recovery [J]. SIAM J Imaging Sci, 2011, 4(1): 1-39.
[13] Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems [J]. SIAM J Imaging Sci, 2009, 2(1): 183-202.
[14] Friedman J, Hastie T, H¨ofling H, et al. Pathwise coordinate optimization [J]. Annals of Applied Statistics, 2007, 1(2): 302-332.
[15] King R, Morgan B J T, Gimenez O, et al. Bayesian analysis for population ecology [M]. Boca Raton: CRC Press, 2010.
[16] Efron B, Hastie T, Johnstone I, et al. Least angle regression [J]. Ann Stat, 2004, 32(2): 407-499.
[17] Tipping M E. Bayesian inference: an introduction to principles and practice in machine learning [J]. Lecture Notes in Computer Science, 2004, 3176: 41-62.
[18] Wu W, Bleecker E, Moore W, et al. Unsupervised phenotyping of Severe Asthma Research Program participants using expanded lung data [J]. J Allergy Clin Immunol, 2014, 133(5): 1280-1288.
[19] Moore W C, Meyers D A, Wenzel S E, et al. Identification of asthma phenotypes using cluster analysis in the Severe Asthma Research Program [J]. Am J Respir Crit Care Med, 2010, 181(4): 315-323.
[20] Hastie T, Tibshirani R F. The elements of statistical learning [M]. New York: Springer, 2009.
[21] Hartigan J A, Wong M A. Algorithm AS 136: a k-means clustering algorithm [J]. Appl Stat, 1979, 28(1): 100.
[22] Jensen D R. Mixture models: theory, geometry and applications [J]. Journal of Statistical Planning and Inference, 1997, 59(1): 179-181.
[23] Fisher R. The use of multiple measurements in taxonomic problems [J]. Ann Eugen, 1936, 7(2): 179-188.
[24] Cox D R. The regression analysis of binary sequences (with discussion) [J]. J Roy Stat Soc B, 1958, 20: 215-242.
[25] Rish I. An empirical study of the naive Bayes classifier [C]//IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. 2001: 1-6.
[26] Cortes C, Vapnik V. Support-vector networks [J]. Mach Learn, 1995, 20(3): 273-297.
[27] Quinlan J R. Simplifying decision trees [J]. International Journal of Man-Machine Studies, 1987, 27(3): 221-234.
[28] Bishop C M. Neural networks for pattern recognition [J]. J Am Stat Assoc, 1995, 92: 482.

[29] Tipping M E. Sparse Bayesian learning and the relevance vector machine [J]. Journal Mach Learn Res, 2001, 1(3): 211-244.
[30] Aho K, Derryberry D, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC [J]. Ecology, 2014, 95(3): 631-636.
[31] Schwarz G. Estimating the dimension of a model [J]. The Annals of Statistics, 1978, 6(2): 461-464.
[32] Toni T, Stumpf M P H. Simulation-based model selection for dynamical systems in systems and population biology [J]. Bioinformatics, 2010, 26(1): 104-110.
[33] Yang X, Guo Y, Skipp P, et al. Automating mass spectrometry proteomics analysis [C]//Fourth International Conference on Bioinformatics and Computational Biology. 2012.
[34] Abeel T, Helleputte T, Van De Peer Y, et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J]. Bioinformatics, 2009, 26(3): 392-398.
[35] Zucknick M, Richardson S, Stronach E A. Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods [J]. Stat Appl Genet Mol Biol, 2008, 7(1): Article7.
[36] Ahmed I, Hartikainen A L, J¨arvelin M R, et al. False discovery rate estimation for stability selection: application to genome-wide association studies [J]. Stat Appl Genet Mol Biol, 2011, 10(1): 1-20.
[37] Alexander D H, Lange K. Stability selection for genome-wide association [J]. Genet Epidemiol, 2011, 35(7): 722-728.
[38] Kirk P, Witkover A, Bangham C R M, et al. Balancing the robustness and predictive performance of biomarkers [J]. J Comput Biol, 2013, 20(12): 979-989.
[39] Newman M E J. Networks: an introduction [M]. Oxford: Oxford University Press, 2010.
[40] Barzel B, Barab´asi A L. Network link prediction by global silencing of indirect correlations [J]. Nat Biotechnol, 2013, 31(8): 720-725.
[41] De La Fuente A, Bing N, Hoeschele I, et al. Discovery of meaningful associations in genomic data using partial correlation coefficients [J]. Bioinformatics, 2004, 20(18): 3565-3574.
[42] Hemelrijk C K. A matrix partial correlation test used in investigations of reciprocity and other social interaction patterns at group level [J]. Journal of Theoretical Biology, 1990, 143(3): 405-420.
[43] Veiga D F T, Vicente F F R, Grivet M, et al. Genome-wide partial correlation analysis of Escherichia coli microarray data [J]. Genet Mol Res, 2007, 6(4): 730-742.
[44] Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso [J]. Biostatistics, 2008, 9(3): 432-441.
[45] Varoquaux G, Gramfort A, Poline J B, et al. Brain covariance selection: better individual functional connectivity models using population prior [C]//Advances in Neural Information
Processing Systems. 2010: 2334-2342.
[46] Feizi S, Marbach D, M´edard M, et al. Network deconvolution as a general method to distinguish direct dependencies in networks [J]. Nat Biotechnol, 2013, 31(8): 726-733.
[47] Weigt M, White R A, Szurmant H, et al. Identification of direct residue contacts in proteinprotein interaction by message passing [J]. Proc Natl Acad Sci , 2009, 106(1): 67-72.

[48] Jordan M I, Wainwright M J. Graphical models, exponential families, and variational inference [M]//Foundations and Trends in Machine Learning. Boston: Now Publishers Inc, 2008: 1-305.
[49] Shimizu S. A linear non-Gaussian acyclic model for causal discovery [J]. J Mach Learn Res, 2006, 7: 2003-2030.
[50] Hyvarinen A, Smith S M. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models [J]. J Mach Learn Res, 2013, 14: 111-152.
[51] Granger C W J. Investigating causal relations by econometric models and cross-spectral methods [J]. Econometrica, 1969, 37(3): 424-438.
[52] Patel R S, Bowman F D, Rilling J K. A Bayesian approach to determining connectivity of the human brain [J]. Hum Brain Mapp, 2006, 27: 267-276.
[53] Dauwels J, Vialatte F, Musha T, et al. A comparative study of synchrony measures for the early diagnosis of Alzheimer’s disease based on EEG [J]. Neuroimage, 2010, 49(1): 668-693.
[54] Smith S M, Miller K L, Salimi-Khorshidi G, et al. Network modelling methods for FMRI [J]. Neuroimage, 2011, 54(2): 875-891.
[55] Villaverde A F, Banga J R. Reverse engineering and identification in systems biology: strategies, perspectives and challenges [J]. J R Soc Interface, 2014, 11(91): 20130505.
[56] Boyd S, Vandenberghe L. Convex optimization [M]. Cambridge: Cambridge University Press, 2004.
[57] Gounaris C, Floudas C. A review of recent advances in global optimization [J]. J Glob Optim, 2009, 45(1): 3-38.
[58] Sun X, Jin L, Xiong M. Extended Kalman filter for estimation of parameters in nonlinear state-space models of biochemical networks [J]. PLoS One, 2008, 3(11): e3758.
[59] Fey D, Findeisen R, Bullinger E. Parameter estimation in kinetic reaction models using nonlinear observers facilitated by model exten [J]. Ifac World Congress Seoul Korea, 2008, 17(1): 313-318.
[60] Welch G, Bishop G. An introduction to the Kalman filter [J]. In Pract, 2006, 7(1): 1-16.
[61] Lillacci G, Khammash M. Parameter estimation and model selection in computational biology [J]. Plos Computational Biology, 2010, 6(3): e1000696.
[62] Quach M, Brunel N, D’alch´e-Buc F. Estimating parameters and hidden variables in nonlinear state-space models based on ODEs for biological networks inference [J]. Bioinformatics, 2007, 23(23): 3209-3216.
[63] Beaumont M A, Zhang W, Baldwin J D. Approximate Bayesian computation in population genetics [J]. Genetics, 2002, 162(4): 2025-2035.
[64] Sisson S A, Fan Y, Tanaka M. Sequential Monte Carlo without likelihoods [J]. Proc Natl Acad Sci, 2007, 104(6): 1760-1765.
[65] Toni T, Welch D, Strelkowa N, et al. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems [J]. J R Soc Interface, 2009,
6: 187-202.
[66] Murphy K P. Machine learning: a probabilistic perspective [M]. Cambridge: MIT Press, 1991.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献