33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025, İstanbul, Türkiye, 25 - 28 Haziran 2025, (Tam Metin Bildiri)
Accurate classification of ear diseases is crucial for early diagnosis and effective treatment. Traditional diagnostic methods rely on subjective visual inspection. Recent advancements in deep learning have facilitated the development of automated diagnostic models. In this study, we propose a hybrid deep learning model that integrates the Swin Transformer architecture with an MLP-Mixer. The model's design integrates the Swin Transformer's hierarchical feature extraction with the MLP-Mixer's token-channel mixing. The Ear Imagery dataset was utilized for training and evaluating the proposed model. Experimental findings indicate that the proposed hybrid architecture achieves superior classification performance compared to traditional CNNs and standalone Vision Transformer models. The proposed model achieved an accuracy of 99.62%, representing a significant improvement in classification performance over the standalone Swin Transformer model, which attained an accuracy of 95.83%.