Combining EfficientNet with ML-Decoder classification head for multi-label retinal disease classification

SİVAZ, ORHAN; AYKUT, MURAT

doi:10.1007/s00521-024-09820-w

Combining EfficientNet with ML-Decoder classification head for multi-label retinal disease classification

Atıf İçin Kopyala

SİVAZ O., AYKUT M.

Neural Computing and Applications, cilt.36, sa.23, ss.14251-14261, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 36 Sayı: 23
Basım Tarihi: 2024
Doi Numarası: 10.1007/s00521-024-09820-w
Dergi Adı: Neural Computing and Applications
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
Sayfa Sayıları: ss.14251-14261
Anahtar Kelimeler: EfficientNet, ML-Decoder, Multi-label classification, Retinal disease detection, SAM optimizer
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Retinal diseases that are not treated in time can cause irreversible, permanent damage, including blindness. Although a patient may suffer from more than one retinal disease at the same time, most of the studies focus on the diagnosis of a single disease only. Therefore, to detect multi-label retinal diseases from color fundus images, we developed an end-to-end deep learning architecture that combines the EfficientNet backbone with the ML-Decoder classification head in this study. While EfficientNet provides powerful feature extraction with fewer parameters via compound scaling, ML-Decoder further improves efficiency and flexibility by reducing quadratic dependency to a linear one and using a group decoding scheme. Also, with the use of sharpness-aware minimization (SAM) optimizer, which minimizes loss value and loss sharpness simultaneously, higher accuracy rates have been reached. In addition, a significant increase in EfficientNet performance is achieved by using image transformations and concatenation together. During the training phase, the random application of the image transformations allows for increasing the image diversity and makes the model more robust. Besides, fusing fundus images of left and right eyes at the pixel level extracts useful information about their relationship. The performance of the final model was evaluated on the publicly available Ocular Disease Intelligent Recognition (ODIR) dataset consisting of 10,000 fundus images, and superior results were obtained in all test set scenarios and performance metrics than state-of-the-art methods. The best results we obtained in the threefold cross-validation scenario for the kappa, F1, and AUC scores are 68.96%, 92.48%, and 94.80%, respectively. Moreover, it can be considered attractive in terms of floating point operations per second (FLOP) and a number of parameters.