Audio forgery detection and localization with super-resolution spectrogram and keypoint-based clustering approach


ÜSTÜBİOĞLU B., TAHAOĞLU G., ULUTAŞ G., Ustubioglu A., KILIÇ M.

JOURNAL OF SUPERCOMPUTING, vol.80, no.1, pp.486-518, 2024 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 80 Issue: 1
  • Publication Date: 2024
  • Doi Number: 10.1007/s11227-023-05504-9
  • Journal Name: JOURNAL OF SUPERCOMPUTING
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Page Numbers: pp.486-518
  • Keywords: Audio copy-move-forgery detection, Audio forensic, Audio forgery, BRIEF feature, Clustering-based matching, High-frequency spectrogram
  • Karadeniz Technical University Affiliated: Yes

Abstract

Malicious individuals can modify speech recordings with advanced audio editing software to create forged audio. The most common forgery method, known as audio copy-move forgery, involves copying part of the audio to duplicate or delete a segment. Considering the fact that the speech recording is used as evidence in court, it is of great importance to detect whether the voice recordings are forged or not. To this end, we present an effective and robust method based on BRIEF and OPTICS to detect and locate audio copy-move forgeries. The proposed method uses super-resolution spectrogram images of the input audio to detect forged parts in suspicious audio recordings. For this purpose, key points and their feature descriptors are first extracted from the spectrogram image using the BRIEF method. The ordering points to identify the clustering structure method (OPTICS) is used by the approach to match the corresponding descriptors. The proposed approach to eliminate false matches evaluates the correctness of the matches. The method also marks the corresponding forged segments in the audio file based on the location of the keypoints in these clusters. The performance results show that the proposed method has significantly high robustness to post-processing attacks such as noise addition, filtering, and especially compression, as reported in the literature.