Unstructured borderline self-organizing map: Learning highly imbalanced, high-dimensional datasets for fault detection

Jaeyeon Jang, Chang Ouk Kim

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Fault detection in industrial processes is critical for yield improvement and manufacturing cost reduction. However, most industrial processes produce highly imbalanced and high-dimensional datasets, in which the normal data overwhelm the fault data in number and many noninformative features add noise to the data distribution. Thus, addressing class imbalance and high-dimensionality problems has been considered key to successful fault detection. In this paper, we propose a novel model called an unstructured borderline self-organizing map (UB-SOM) designed to solve these two problems. UB-SOM not only learns the distribution of the normal samples through a small number of representative nodes but also highlights borderline areas. Since UB-SOM yields a new data distribution that emphasizes borderlines, the distributional change from the normal data to the representative nodes reveals which features are considered significant in the borderline areas. We select the significant features based on the featurewise distributional change measured using the Kullback-Leibler divergence. UB-SOM is evaluated based on ten publicly available benchmark imbalanced datasets and two semiconductor process datasets. The experimental results show that we can increase the G-mean by 0.441 for the benchmark datasets and 0.657 for the industrial datasets with data preprocessing throughout UB-SOM. As a result, the proposed method outperforms various undersampling methods incorporating classifier-based feature selection methods.

Original languageEnglish
Article number116028
JournalExpert Systems with Applications
Volume188
DOIs
StatePublished - Feb 2022

Bibliographical note

Publisher Copyright:
© 2021 Elsevier Ltd

Keywords

  • Borderline
  • Fault detection
  • Feature selection
  • Industrial processes
  • Resampling
  • Self-organizing map

Fingerprint

Dive into the research topics of 'Unstructured borderline self-organizing map: Learning highly imbalanced, high-dimensional datasets for fault detection'. Together they form a unique fingerprint.

Cite this