Abstract
Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.
Original language | English |
---|---|
Title of host publication | Proceedings - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 2018 |
Editors | Francesco Bonchi, Foster Provost, Tina Eliassi-Rad, Wei Wang, Ciro Cattuto, Rayid Ghani |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 109-119 |
Number of pages | 11 |
ISBN (Electronic) | 9781538650905 |
DOIs | |
State | Published - 2 Jul 2018 |
Event | 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 - Turin, Italy Duration: 1 Oct 2018 → 4 Oct 2018 |
Publication series
Name | Proceedings - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 2018 |
---|
Conference
Conference | 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 |
---|---|
Country/Territory | Italy |
City | Turin |
Period | 1/10/18 → 4/10/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE.
Keywords
- Generalized Bayesian Factor Analysis
- High Dimensional Data
- Integrative Analysis
- Integrative Clustering
- Markov Random Field (MRF)
- NCI60
- Network Information
- Omics Data
- Spike and Slab Lasso (SSL)
- Structural Information
- Variational EM Algorithm