TY - JOUR
T1 - Deep convolutional neural network for classification of thyroid nodules on ultrasound
T2 - Comparison of the diagnostic performance with that of radiologists
AU - Kim, Yeon Jae
AU - Choi, Yangsean
AU - Hur, Su Jin
AU - Park, Ki Sun
AU - Kim, Hyun Jin
AU - Seo, Minkook
AU - Lee, Min Kyoung
AU - Jung, So Lyung
AU - Jung, Chan Kwon
N1 - Funding Information:
Funding This research was supported by the Basic Science Research Program through the National Research Foundation of Korea, funded by the Ministry of Education (2021R1I1A1A01040285).
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/7
Y1 - 2022/7
N2 - Purpose: This study aimed to train and validate deep learning (DL) models for differentiating malignant from benign thyroid nodules on US images and compare their performance with that of radiologists. Methods: Images of thyroid nodules in patients who underwent US-guided fine-needle aspiration biopsy at our institution between January 2010 and March 2020 were retrospectively reviewed. Four radiologists independently classified the images. Images of thyroid nodules were trained using three different image classification DL models (VGG16, VGG19, and ResNet). The diagnostic performances of the DL models were calculated for the internal and external datasets and compared with the diagnoses of the four radiologists. Pairwise comparisons of the AUCs between the radiologists and DL models were made using bootstrap-based tests. Results: In total, 15,409 images from 7,321 patients (mean age, 60 ± 13 years; malignant nodules, 20.7%) were randomly grouped into training (n = 12,327) and validation (n = 3,082) sets. Independent internal (n = 432; 197 patients) and external (n = 168; 59 patients) test sets were also acquired. The DL models demonstrated a higher diagnostic performance than the radiologists in the internal test set (AUC, 0.83 – 0.86 vs. 0.71 – 0.76, P < 0.05), but not in the external test set. The VGG16 model demonstrated the highest diagnostic performance in internal (AUC, 0.86; sensitivity, 91.8%; specificity, 73.2%) and external (AUC: 0.83; sensitivity: 78.6%; specificity: 76.8%) test sets. However, no statistical differences were found in the AUCs among the DL models. Conclusions: The DL models demonstrated comparable diagnostic performance to radiologists in distinguishing benign from malignant thyroid nodules on US images and may play a potential role in augmenting radiologists’ diagnosis of thyroid nodules.
AB - Purpose: This study aimed to train and validate deep learning (DL) models for differentiating malignant from benign thyroid nodules on US images and compare their performance with that of radiologists. Methods: Images of thyroid nodules in patients who underwent US-guided fine-needle aspiration biopsy at our institution between January 2010 and March 2020 were retrospectively reviewed. Four radiologists independently classified the images. Images of thyroid nodules were trained using three different image classification DL models (VGG16, VGG19, and ResNet). The diagnostic performances of the DL models were calculated for the internal and external datasets and compared with the diagnoses of the four radiologists. Pairwise comparisons of the AUCs between the radiologists and DL models were made using bootstrap-based tests. Results: In total, 15,409 images from 7,321 patients (mean age, 60 ± 13 years; malignant nodules, 20.7%) were randomly grouped into training (n = 12,327) and validation (n = 3,082) sets. Independent internal (n = 432; 197 patients) and external (n = 168; 59 patients) test sets were also acquired. The DL models demonstrated a higher diagnostic performance than the radiologists in the internal test set (AUC, 0.83 – 0.86 vs. 0.71 – 0.76, P < 0.05), but not in the external test set. The VGG16 model demonstrated the highest diagnostic performance in internal (AUC, 0.86; sensitivity, 91.8%; specificity, 73.2%) and external (AUC: 0.83; sensitivity: 78.6%; specificity: 76.8%) test sets. However, no statistical differences were found in the AUCs among the DL models. Conclusions: The DL models demonstrated comparable diagnostic performance to radiologists in distinguishing benign from malignant thyroid nodules on US images and may play a potential role in augmenting radiologists’ diagnosis of thyroid nodules.
KW - Biopsy, Fine-Needle
KW - Deep Learning
KW - Sensitivity and Specificity
KW - Thyroid Nodule
UR - http://www.scopus.com/inward/record.url?scp=85129498227&partnerID=8YFLogxK
U2 - 10.1016/j.ejrad.2022.110335
DO - 10.1016/j.ejrad.2022.110335
M3 - Article
C2 - 35512512
AN - SCOPUS:85129498227
SN - 0720-048X
VL - 152
JO - European Journal of Radiology
JF - European Journal of Radiology
M1 - 110335
ER -