Gumushane Universitesi Fen Bilimleri Dergisi, cilt.15, sa.2, ss.545-562, 2025 (Scopus)
In recent years, deep learning-based computer vision methods have made significant progress in the field of automatic classification of food images. However, the majority of studies in this field focus on Western and Far Eastern cuisines, while rich and visually complex local cuisines such as Turkish cuisine are underrepresented. In this context, our study aims to evaluate the effectiveness of transformer-based deep learning models in classifying food images of Turkish cuisine. Six current models, namely ViT, Swin Transformer (V1, V2), ConvNeXt, BEiT, and DEiT, are comparatively tested on four different datasets, namely Food4, Food15, Food24, and Turkish Food-102. Standardized hyperparameters and an 80%-20% training-test separation were used in the training process, and the performance of the models was evaluated with metrics such as accuracy, loss, f1-score, and Cohen's Kappa. The results showed that the ConvNeXt model achieved the highest accuracy in all datasets. Swin Transformer models also performed similarly, while ViT and BEiT performed less well. The data cleaning process resulted in an average 2-5% increase in model accuracy, demonstrating the critical importance of data quality. These findings show that transformer-based models, are very promising for automatic classification of Turkish cuisine dishes. Moreover, the critical importance of dataset quality on model success is once again emphasized. This study is expected to make a significant contribution to the field of food categorization specific to Turkish cuisine and provide guidance for future research. The results obtained can form a potential basis for restaurant recommendation systems and similar applications.