Detecting audio splicing forgery: A noise-robust approach with Swin Transformer and cochleagram


GÜLSOY T., KANCA GÜLSOY E., Ustubioglu A., ÜSTÜBİOĞLU B., BAYKAL KABLAN E., AYAS S., ...Daha Fazla

Journal of Information Security and Applications, cilt.93, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 93
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.jisa.2025.104130
  • Dergi Adı: Journal of Information Security and Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Anahtar Kelimeler: Artificial intelligence, Audio splicing forgery, Cochleagram, Security, Swin Transformer
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Audio splicing forgery involves cutting specific parts of an audio recording and inserting or combining them into another audio recording. This manipulation technique is often used to create misleading or fake audio content, particularly in digital media environments. The detection of audio splicing forgery is of great importance, especially in forensic analysis, security applications and media verification processes. In this paper, we present a novel noise robust method for detecting audio splicing forgery. The proposed method converts audio signals into cochleagram images, which are then input into SWIN transformer model for training. Following the training process, the model classifies and labels test audio files as either original or fake. In the experiments, the method is tested on data sets of varying durations. The results demonstrate high performance across different datasets, both without and with Gaussian noise, as well as under real-world environmental noise attacks with varying audio durations. For example, under 30 dB noise condition on 2-second data segments, the model achieved an accuracy of 94.33%, precision of 96.46%, recall of 92.90%, and an F1-score of 94.65%. For rain noise condition, the proposed method achieves the highest accuracy of 93.26%, precision of 99.83%, and F1-score of 95.48%.