Generalized bayesian factor analysis for integrative clustering with applications to multi-omics data

Eun Jeong Min, Changgee Chang, Qi Long

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 2018
EditorsFrancesco Bonchi, Foster Provost, Tina Eliassi-Rad, Wei Wang, Ciro Cattuto, Rayid Ghani
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages109-119
Number of pages11
ISBN (Electronic)9781538650905
DOIs
StatePublished - 2 Jul 2018
Event5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018 - Turin, Italy
Duration: 1 Oct 20184 Oct 2018

Publication series

NameProceedings - 2018 IEEE 5th International Conference on Data Science and Advanced Analytics, DSAA 2018

Conference

Conference5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018
Country/TerritoryItaly
CityTurin
Period1/10/184/10/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

Keywords

  • Generalized Bayesian Factor Analysis
  • High Dimensional Data
  • Integrative Analysis
  • Integrative Clustering
  • Markov Random Field (MRF)
  • NCI60
  • Network Information
  • Omics Data
  • Spike and Slab Lasso (SSL)
  • Structural Information
  • Variational EM Algorithm

Fingerprint

Dive into the research topics of 'Generalized bayesian factor analysis for integrative clustering with applications to multi-omics data'. Together they form a unique fingerprint.

Cite this