Detecting audio splicing forgery: A noise-robust approach with Swin Transformer and cochleagram


GÜLSOY T., KANCA GÜLSOY E., Ustubioglu A., ÜSTÜBİOĞLU B., BAYKAL KABLAN E., AYAS S., ...More

JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, vol.93, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Publication Type: Article / Article
  • Volume: 93
  • Publication Date: 2025
  • Doi Number: 10.1016/j.jisa.2025.104130
  • Journal Name: JOURNAL OF INFORMATION SECURITY AND APPLICATIONS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Keywords: Artificial intelligence, Audio splicing forgery, Cochleagram, Security, Swin Transformer
  • Karadeniz Technical University Affiliated: Yes

Abstract

Audio splicing forgery involves cutting specific parts of an audio recording and inserting or combining them into another audio recording. This manipulation technique is often used to create misleading or fake audio content, particularly in digital media environments. The detection of audio splicing forgery is of great importance, especially in forensic analysis, security applications and media verification processes. In this paper, we present a novel noise robust method for detecting audio splicing forgery. The proposed method converts audio signals into cochleagram images, which are then input into SWIN transformer model for training. Following the training process, the model classifies and labels test audio files as either original or fake. In the experiments, the method is tested on data sets of varying durations. The results demonstrate high performance across different datasets, both without and with Gaussian noise, as well as under real-world environmental noise attacks with varying audio durations. For example, under 30 dB noise condition on 2-second data segments, the model achieved an accuracy of 94.33%, precision of 96.46%, recall of 92.90%, and an F1-score of 94.65%. For rain noise condition, the proposed method achieves the highest accuracy of 93.26%, precision of 99.83%, and F1-score of 95.48% .