ArCapsNet for Audio Splicing Forgery Detection


Üstübioğlu B., Dincer S., Ustubioglu A., Ulutaş G.

2024 47th International Conference on Telecommunications and Signal Processing (TSP), 10 Temmuz 2024, ss.298-301

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/tsp63128.2024.10605934
  • Sayfa Sayıları: ss.298-301
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Complex forgeries such as deepfakes, which are very popular today, as well as very simple but effective manipulation techniques, the production of which does not require the use of deep networks such as GAN, are still practiced. Audio splicing forgery, which combines multiple speech segments from different recordings of a person to alter the content of a speech recording, is one of these manipulation techniques and presents a great challenge to audio forgery. In this paper, a novel audio splicing detection method based on ArCapsN et architecture is proposed. The proposed method consists of two stages. In the first stage, the audio file given as input is converted into a cochleagram image. In the second stage, features are extracted from the cochleagram images with EfficientN et and the ArCapsN et is trained with these features. As a result of the training, the audio files given as a test are labelled as forged/original. The proposed method is tested on the database created by us using the TIMIT database. Our model gives the baseline metrics precision, recall and Fl metrics by 98.59%, 97.99%, and 98.29% on the dataset, respectively.