Hubert-Derived SSL Features and ECAPA-TDNN Matching for Robust Audio Deepfake Detection


TAHAOĞLU G.

35th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2025, İstanbul, Türkiye, 31 Ağustos - 03 Eylül 2025, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/mlsp62443.2025.11204297
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: audio spoofing detection, deepfake audio, ECAPA-TDNN, HuBERT-based features
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

The rapid advancement and growing accessibility of deepfake audio technologies have prompted substantial concerns, particularly in domains such as politics and media, regarding the reliability of distinguishing between authentic and manipulated audio recordings. This study proposes a robust deepfake audio detection framework combining selfsupervised learning (SSL) features extracted using HuBERT models and a powerful ECAPA-TDNN classifier enhanced with One-Class Softmax (OC-Softmax). Three HuBERT variants-Base, Large, and XLarge-were assessed, along with various fusion strategies. Experiments were conducted on the ASVspoof 2019 LA dataset and demonstrated that the proposed system significantly outperforms existing state-of-the-art approaches. The best configuration, based on score level fusion of HuBERT-Large and HuBERT-XLarge with ECAPA-TDNN, achieved an EER of 0.20% and a minimum t-DCF of 0.006. Available at: https://github.com/gultahaoglu/Hubertderiveredfeatures_deepfakeaudiodetection