A Hybrid Swin Transformer and MLP-Mixer Approach for Automated Ear Disease Diagnosis from Otoscopic Images


Demircan F., Comert Z., Karadeniz A. T., EKİNCİ M.

33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025, İstanbul, Türkiye, 25 - 28 Haziran 2025, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu66497.2025.11111872
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: Deep Learning, Ear Disease Classification, Medical Image Analysis, MLP- Mixer, Swin Transformer
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Accurate classification of ear diseases is crucial for early diagnosis and effective treatment. Traditional diagnostic methods rely on subjective visual inspection. Recent advancements in deep learning have facilitated the development of automated diagnostic models. In this study, we propose a hybrid deep learning model that integrates the Swin Transformer architecture with an MLP-Mixer. The model's design integrates the Swin Transformer's hierarchical feature extraction with the MLP-Mixer's token-channel mixing. The Ear Imagery dataset was utilized for training and evaluating the proposed model. Experimental findings indicate that the proposed hybrid architecture achieves superior classification performance compared to traditional CNNs and standalone Vision Transformer models. The proposed model achieved an accuracy of 99.62%, representing a significant improvement in classification performance over the standalone Swin Transformer model, which attained an accuracy of 95.83%.