An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data

Junghye Lee, In Young Choi, Chi Hyuck Jun

Research output: Contribution to journalArticlepeer-review

61 Scopus citations

Abstract

Classification of microarray data plays a significant role in the diagnosis and prediction of cancer. However, its high-dimensionality (>tens of thousands) compared to the number of observations (<tens of hundreds) may lead to poor classification accuracy. In addition, only a fraction of genes is really important for the classification of a certain cancer, and thus feature selection is very essential in this field. Due to the time and memory burden for processing the high-dimensional data, univariate feature ranking methods are widely-used in gene selection. However, most of them are not that accurate because they only consider the relevance of features to the target without considering the redundancy among features. In this study, we propose a novel multivariate feature ranking method to improve the quality of gene selection and ultimately to improve the accuracy of microarray data classification. The method can be efficiently applied to high-dimensional microarray data. We embedded the formal definition of relevance into a Markov blanket (MB) to create a new feature ranking method. Using a few microarray datasets, we demonstrated the practicability of MB-based feature ranking having high accuracy and good efficiency. The method outperformed commonly-used univariate ranking methods and also yielded the better result even compared with the other multivariate feature ranking method due to the advantage of data efficiency.

Original languageEnglish
Article number113971
JournalExpert Systems with Applications
Volume166
DOIs
StatePublished - 15 Mar 2021

Bibliographical note

Publisher Copyright:
© 2020 The Author(s)

Keywords

  • Classification
  • Gene selection
  • High-dimensional data
  • Markov blanket
  • Microarray data
  • Mixed-type data
  • Multiclass
  • Multivariate feature selection
  • Ranking

Fingerprint

Dive into the research topics of 'An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data'. Together they form a unique fingerprint.

Cite this