Detection of audio copy-move-forgery with novel feature matching on Mel spectrogram


EXPERT SYSTEMS WITH APPLICATIONS, vol.213, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 213
  • Publication Date: 2023
  • Doi Number: 10.1016/j.eswa.2022.118963
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
  • Keywords: Audio copy-move-forgery detection, Audio forgery, Audio forensic, Mel spectrogram, SIFT keypoints, Forgery localization, PITCH
  • Karadeniz Technical University Affiliated: Yes


Audio copy-move-forgery created by copying one or more segments of an audio file and pasting it in a different position within the same audio is one of the most widely used methods in the field of audio forensics. This type of forgery is easy to apply but difficult to detect in the case of post-processing operations applied to forged speech to hide traces of forgeries. This paper proposes a robust method for the detection and localization of the audio copy -move forgery using a keypoint-based approach to the Mel spectrogram representation of audio. In the proposed method, first, the Mel spectrogram image is created from the input audio. Then, SIFT keypoints are obtained from each RGB color channel of this image. The obtained keypoints from each channel are matched via feature vectors to reveal the clues of the forgery regions, and the image sub-blocks whose keypoints are determined to be the center are labeled as forged blocks. Then the blocks in the neighborhood of the forged blocks are investigated whether forged or not. The proposed post-processing stage completes the determination of the forged regions. This stage eliminates the possible false positives and marks the forged areas in the spectrogram image. The forged segments are marked in the audio file by utilizing the positions of the forged regions in the spectrogram image. Experimental studies are carried out on two pitch-based datasets, using TIMIT and Arabic Speech Corpus. The paper presents the detailed performance results of popular referenced studies on these datasets. The performance results prove that the proposed method is more robust against common post-processing operations such as noise addition, filtering operation, and especially compression operation.