Non-invasive way to diagnose dysphagia by training deep learning model with voice spectrograms

Heekyu Kim, Hae Yeon Park, Do Gyeom Park, Sun Im, Seungchul Lee

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Background and objective: Patients with dysphagia show changes in articulation and voice quality, and recent studies using machine learning models have been employed to help in the classification. This study aimed to apply a novel deep learning method using only the patient‘s voice to classify normal controls from dysphagia patients and determine whether this new deep learning method may help provide a rapid and accurate means to supplement the existing clinical methods in dysphagia screening and assessment. Methods: Voice samples from 299 healthy controls and 290 patients with post-stroke dysphagia; who performed four simple phonation tasks; were obtained in a prospective manner at a university-affiliated hospital using a smart digital device. Deep learning methods were employed as follows: firstly, a spectrogram is obtained through the short time Fourier transform (STFT) and Mel-frequency cepstral coefficients (MFCC) on a sound signal, respectively. Secondly, the STFT and MFCC spectrograms obtained for each protocol are fed to each multibranch model. Finally, during the test, each model is ensembled in a soft voting method to distinguish normal and dysphagia classes. Results: Five evaluation metrics are used to evaluate the performance of the model: AUC, Sensitivity, Specificity, Positive predictive value (PPV), and Negative predictive value (NPV). Among the performance metrics, sensitivity and specificity levels are compared with the existing diagnostic tools. The ensemble model incorporating all four tasks showed an AUC of 0.950 ± 0.004, with sensitivity and specificity levels as high as 94.7% and 77.9%, respectively. Conclusions: The novel deep learning model proposed in this paper shows promising performance levels in dysphagia classification. Our results show that the ensemble method used in this study may be utilized as a convenient and rapid digital biomarker of dysphagia in a non-invasive and automated manner.

Original languageEnglish
Article number105259
JournalBiomedical Signal Processing and Control
Volume86
DOIs
StatePublished - Sep 2023

Bibliographical note

Publisher Copyright:
© 2023 Elsevier Ltd

Keywords

  • Acoustic data
  • Deep learning
  • Disease detection
  • Dysphagia
  • MFCC
  • STFT

Fingerprint

Dive into the research topics of 'Non-invasive way to diagnose dysphagia by training deep learning model with voice spectrograms'. Together they form a unique fingerprint.

Cite this