Spoofed Audio Detection Using a Fusion of Transformer Based Architectures
2025 18th International Conference on Information Security and Cryptology (ISCTürkiye), Ankara, Türkiye, 22 - 23 Ekim 2025, ss.1-6, (Tam Metin Bildiri)
- Yayın Türü: Bildiri / Tam Metin Bildiri
- Doi Numarası: 10.1109/isctrkiye68593.2025.11224850
- Basıldığı Şehir: Ankara
- Basıldığı Ülke: Türkiye
- Sayfa Sayıları: ss.1-6
- Karadeniz Teknik Üniversitesi Adresli: Evet
Özet
Through the rapid evolution of deepfake audio generation, and more importantly, its quite simplified access through easy-to-use tools, synthetic speech generation and its abuse have become a considerable threat over the years. It becomes clear that the detection of the spoofed audio will be a growing concern in the future. So, building robust and reliable methods to achieve the highest detection rates is needed. To address this issue, we proposed a method to detect the spoofed audio from genuine audio effectively. We utilized the cochleagram images for feature extraction, which is the closest to the human ear's biology, and used ViT and XCiT architectures for classification purposes. At the end, to eliminate deficiencies of one architecture to another, we adapted Late Score Fusing, achieving 6.94 % EER and 0.11