收稿日期: 2016-01-12
网络出版日期: 2016-02-29
Precision medicine and big data
Received date: 2016-01-12
Online published: 2016-02-29
郭毅可1,2, 杨氙2 . 精确医学与大数据[J]. 上海大学学报(自然科学版), 2016 , 22(1) : 17 -27 . DOI: 10.3969/j.issn.1007-2861.2015.05.015
To achieve precision medicine, collecting and analysing various big data are needed to quantify individual patients. This paper first discusses the need of using data from molecular level to pathway level and also incorporating medical imaging data. Different preprocessing methods should be developed for different data type, while some postprocessing steps for various data types, such as classification and network analysis, can be done by a generalized approach. From the perspective of research questions, this paper then studies methods for answering five typical questions from simple to complex. These
questions are detecting associations, identifying groups, constructing classifiers, deriving connectivity and building dynamic models.
Key words: analysis methods; big data; precision medicine
[1] Winslow R L, Trayanova N, Geman D, et al. Computational medicine: translating models to clinical care [J]. Sci Transl Med, 2012, 4(158): 158rv11.
[2] Coveney P, D´?az-Zuccarini V, Hunter P, et al. Computational biomedicine [C]//Computational Biomedicine. 2014: 296.
[3] Wolkenhauer O. Why model? [J]. Front Physiol, 2014, 5: 1-5.
[4] Pearson K. Note on regression and inheritance in the case of two parents [J]. Proc R Soc London, 2006, 58(1): 240-242.
[5] Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of maxdependency [C]//IEEE Trans Pattern Anal. 2005: 1226-1238.
[6] Reshef D N, Reshef Y A, Finucane H K, et al. Detecting novel associations in large data sets [J]. Science, 2011, 334(6062): 1518-1524.
[7] Freedman D. Statistical models: theory and practice [M]. Cambridge: Cambridge University Press, 2005.
[8] Tibshirani R. Regression selection and shrinkage via the Lasso [J]. Journal of the Royal Statistical Society B, 1994, 58: 267-288.
[9] Chen S S, Donoho D L, Saunders M A. Atomic decomposition by basis pursuit [J]. SIAM Journal on Scientific Computing, 1998, 20(1): 33-61.
[10] Becker S R, Cand`es E J, Grant M C. Templates for convex cone problems with applications to sparse signal recovery [J]. Math Program Comput, 2011, 3(3): 165-218.
[11] Boyd S. Distributed optimization and statistical learning via the alternating direction method of multipliers [J]. Found Trends Mach Learn, 2010, 3(1): 1-122.
[12] Becker S, Bobin J, Cand`es E J. NESTA: a fast and accurate first-order method for sparse recovery [J]. SIAM J Imaging Sci, 2011, 4(1): 1-39.
[13] Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems [J]. SIAM J Imaging Sci, 2009, 2(1): 183-202.
[14] Friedman J, Hastie T, H¨ofling H, et al. Pathwise coordinate optimization [J]. Annals of Applied Statistics, 2007, 1(2): 302-332.
[15] King R, Morgan B J T, Gimenez O, et al. Bayesian analysis for population ecology [M]. Boca Raton: CRC Press, 2010.
[16] Efron B, Hastie T, Johnstone I, et al. Least angle regression [J]. Ann Stat, 2004, 32(2): 407-499.
[17] Tipping M E. Bayesian inference: an introduction to principles and practice in machine learning [J]. Lecture Notes in Computer Science, 2004, 3176: 41-62.
[18] Wu W, Bleecker E, Moore W, et al. Unsupervised phenotyping of Severe Asthma Research Program participants using expanded lung data [J]. J Allergy Clin Immunol, 2014, 133(5): 1280-1288.
[19] Moore W C, Meyers D A, Wenzel S E, et al. Identification of asthma phenotypes using cluster analysis in the Severe Asthma Research Program [J]. Am J Respir Crit Care Med, 2010, 181(4): 315-323.
[20] Hastie T, Tibshirani R F. The elements of statistical learning [M]. New York: Springer, 2009.
[21] Hartigan J A, Wong M A. Algorithm AS 136: a k-means clustering algorithm [J]. Appl Stat, 1979, 28(1): 100.
[22] Jensen D R. Mixture models: theory, geometry and applications [J]. Journal of Statistical Planning and Inference, 1997, 59(1): 179-181.
[23] Fisher R. The use of multiple measurements in taxonomic problems [J]. Ann Eugen, 1936, 7(2): 179-188.
[24] Cox D R. The regression analysis of binary sequences (with discussion) [J]. J Roy Stat Soc B, 1958, 20: 215-242.
[25] Rish I. An empirical study of the naive Bayes classifier [C]//IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. 2001: 1-6.
[26] Cortes C, Vapnik V. Support-vector networks [J]. Mach Learn, 1995, 20(3): 273-297.
[27] Quinlan J R. Simplifying decision trees [J]. International Journal of Man-Machine Studies, 1987, 27(3): 221-234.
[28] Bishop C M. Neural networks for pattern recognition [J]. J Am Stat Assoc, 1995, 92: 482.
[29] Tipping M E. Sparse Bayesian learning and the relevance vector machine [J]. Journal Mach Learn Res, 2001, 1(3): 211-244.
[30] Aho K, Derryberry D, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC [J]. Ecology, 2014, 95(3): 631-636.
[31] Schwarz G. Estimating the dimension of a model [J]. The Annals of Statistics, 1978, 6(2): 461-464.
[32] Toni T, Stumpf M P H. Simulation-based model selection for dynamical systems in systems and population biology [J]. Bioinformatics, 2010, 26(1): 104-110.
[33] Yang X, Guo Y, Skipp P, et al. Automating mass spectrometry proteomics analysis [C]//Fourth International Conference on Bioinformatics and Computational Biology. 2012.
[34] Abeel T, Helleputte T, Van De Peer Y, et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J]. Bioinformatics, 2009, 26(3): 392-398.
[35] Zucknick M, Richardson S, Stronach E A. Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods [J]. Stat Appl Genet Mol Biol, 2008, 7(1): Article7.
[36] Ahmed I, Hartikainen A L, J¨arvelin M R, et al. False discovery rate estimation for stability selection: application to genome-wide association studies [J]. Stat Appl Genet Mol Biol, 2011, 10(1): 1-20.
[37] Alexander D H, Lange K. Stability selection for genome-wide association [J]. Genet Epidemiol, 2011, 35(7): 722-728.
[38] Kirk P, Witkover A, Bangham C R M, et al. Balancing the robustness and predictive performance of biomarkers [J]. J Comput Biol, 2013, 20(12): 979-989.
[39] Newman M E J. Networks: an introduction [M]. Oxford: Oxford University Press, 2010.
[40] Barzel B, Barab´asi A L. Network link prediction by global silencing of indirect correlations [J]. Nat Biotechnol, 2013, 31(8): 720-725.
[41] De La Fuente A, Bing N, Hoeschele I, et al. Discovery of meaningful associations in genomic data using partial correlation coefficients [J]. Bioinformatics, 2004, 20(18): 3565-3574.
[42] Hemelrijk C K. A matrix partial correlation test used in investigations of reciprocity and other social interaction patterns at group level [J]. Journal of Theoretical Biology, 1990, 143(3): 405-420.
[43] Veiga D F T, Vicente F F R, Grivet M, et al. Genome-wide partial correlation analysis of Escherichia coli microarray data [J]. Genet Mol Res, 2007, 6(4): 730-742.
[44] Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso [J]. Biostatistics, 2008, 9(3): 432-441.
[45] Varoquaux G, Gramfort A, Poline J B, et al. Brain covariance selection: better individual functional connectivity models using population prior [C]//Advances in Neural Information
Processing Systems. 2010: 2334-2342.
[46] Feizi S, Marbach D, M´edard M, et al. Network deconvolution as a general method to distinguish direct dependencies in networks [J]. Nat Biotechnol, 2013, 31(8): 726-733.
[47] Weigt M, White R A, Szurmant H, et al. Identification of direct residue contacts in proteinprotein interaction by message passing [J]. Proc Natl Acad Sci , 2009, 106(1): 67-72.
[48] Jordan M I, Wainwright M J. Graphical models, exponential families, and variational inference [M]//Foundations and Trends in Machine Learning. Boston: Now Publishers Inc, 2008: 1-305.
[49] Shimizu S. A linear non-Gaussian acyclic model for causal discovery [J]. J Mach Learn Res, 2006, 7: 2003-2030.
[50] Hyvarinen A, Smith S M. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models [J]. J Mach Learn Res, 2013, 14: 111-152.
[51] Granger C W J. Investigating causal relations by econometric models and cross-spectral methods [J]. Econometrica, 1969, 37(3): 424-438.
[52] Patel R S, Bowman F D, Rilling J K. A Bayesian approach to determining connectivity of the human brain [J]. Hum Brain Mapp, 2006, 27: 267-276.
[53] Dauwels J, Vialatte F, Musha T, et al. A comparative study of synchrony measures for the early diagnosis of Alzheimer’s disease based on EEG [J]. Neuroimage, 2010, 49(1): 668-693.
[54] Smith S M, Miller K L, Salimi-Khorshidi G, et al. Network modelling methods for FMRI [J]. Neuroimage, 2011, 54(2): 875-891.
[55] Villaverde A F, Banga J R. Reverse engineering and identification in systems biology: strategies, perspectives and challenges [J]. J R Soc Interface, 2014, 11(91): 20130505.
[56] Boyd S, Vandenberghe L. Convex optimization [M]. Cambridge: Cambridge University Press, 2004.
[57] Gounaris C, Floudas C. A review of recent advances in global optimization [J]. J Glob Optim, 2009, 45(1): 3-38.
[58] Sun X, Jin L, Xiong M. Extended Kalman filter for estimation of parameters in nonlinear state-space models of biochemical networks [J]. PLoS One, 2008, 3(11): e3758.
[59] Fey D, Findeisen R, Bullinger E. Parameter estimation in kinetic reaction models using nonlinear observers facilitated by model exten [J]. Ifac World Congress Seoul Korea, 2008, 17(1): 313-318.
[60] Welch G, Bishop G. An introduction to the Kalman filter [J]. In Pract, 2006, 7(1): 1-16.
[61] Lillacci G, Khammash M. Parameter estimation and model selection in computational biology [J]. Plos Computational Biology, 2010, 6(3): e1000696.
[62] Quach M, Brunel N, D’alch´e-Buc F. Estimating parameters and hidden variables in nonlinear state-space models based on ODEs for biological networks inference [J]. Bioinformatics, 2007, 23(23): 3209-3216.
[63] Beaumont M A, Zhang W, Baldwin J D. Approximate Bayesian computation in population genetics [J]. Genetics, 2002, 162(4): 2025-2035.
[64] Sisson S A, Fan Y, Tanaka M. Sequential Monte Carlo without likelihoods [J]. Proc Natl Acad Sci, 2007, 104(6): 1760-1765.
[65] Toni T, Welch D, Strelkowa N, et al. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems [J]. J R Soc Interface, 2009,
6: 187-202.
[66] Murphy K P. Machine learning: a probabilistic perspective [M]. Cambridge: MIT Press, 1991.
/
| 〈 |
|
〉 |