Mel spectrogram-based audio forgery detection using CNN

Ustubioglu, ARDA; ÜSTÜBİOĞLU, BESTE; ULUTAŞ, GÜZİN

doi:10.1007/s11760-022-02436-4

Mel spectrogram-based audio forgery detection using CNN

Ustubioglu A., ÜSTÜBİOĞLU B., ULUTAŞ G.

SIGNAL IMAGE AND VIDEO PROCESSING, cilt.17, sa.5, ss.2211-2219, 2023 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 17 Sayı: 5
Basım Tarihi: 2023
Doi Numarası: 10.1007/s11760-022-02436-4
Dergi Adı: SIGNAL IMAGE AND VIDEO PROCESSING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
Sayfa Sayıları: ss.2211-2219
Anahtar Kelimeler: Copy-move forgery detection, Audio forgery, Audio forensic, Spectrogram-CNN, COPY-MOVE DETECTION, PITCH
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

In this time of technology, digital speech can be created and falsified by a very diverse of hardware and software technologies. Audio copy-move forgery is an audio forgery technique that goals to create forged audio by hiding undesirable words or repeating wanted words in identical speech. Therefore, audio authentication has been a necessary requisition. In this study, an effective approach to spectral images based on audio copy-move forgery detection using convolutional neural networks (CNN) with data augmentation is proposed. There are only a few handcrafted methods conducted for the detection of audio copy-move forgery. None of the existing works on audio copy-move forgery detection has proposed deep feature learning from speech recording with Mel spectrogram. This is the first method to employ deep learning with Mel spectrogram of audio for the detection of audio copy-move forgery. The proposed CNN architecture classifies the suspicious Mel spectrogram images into two classes: original and forged. The proposed CNN system is successfully trained on these Mel spectrogram image feature extraction. The proposed algorithm has been tested on our datasets generated from Arabic Speech Corpus and TIMIT speech database. The results show the effectiveness, robustness of post-processing operations, and high accuracy of the proposed approach compared to other studies.