An Attack-Independent Audio Forgery Detection Technique Based On Cochleagram Images Of Segments with Dynamic Threshold


IEEE Access, vol.12, pp.82660-82675, 2024 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 12
  • Publication Date: 2024
  • Doi Number: 10.1109/access.2024.3409543
  • Journal Name: IEEE Access
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Page Numbers: pp.82660-82675
  • Keywords: Cochleagram, copy-move forgery, forgery detection, SSIM
  • Karadeniz Technical University Affiliated: Yes


Thanks to advanced audio editing software, speech recordings can be tampered with very easily. In case the speech recordings are used as forensic evidence, adding the audio recordings together, cutting them, and changing their content are legally unacceptable and constitute a crime. Audio copy-move forgery is the most common forgery for the purpose of changing the content of the speech. Audio copy-move forgery is performed by copying a segment in the audio and pasting it anywhere in the same audio. In this study, a robust and new method based on cochleagram images is proposed to detect audio copy-move forgery. The proposed method uses cochleagram images of the voiced parts of the audio to detect forgery clues in the input audio file. For this purpose, the audio file is first split into voiced parts using a pitch-based Voice Activity Detection (VAD) method. Each audio part is then converted into a cochleagram image. Structural similarity index measure (SSIM) is used to calculate the similarity between cochleagram images. After calculating the SSIM values between the cochleagram images, the proposed forgery localization algorithm is performed. In this algorithm, the SSIM values among the cochleagram images are first sorted in descending order. The length ratio between these pairs of segments is calculated in order to determine which of the values in this descending order are duplicated segment pairs. If this ratio exceeds the specified percentage rate, these segment pairs are marked as forged segments. Finally, the proposed audio copy-move forgery detection method is evaluated against the state-of-the-art approaches with two Copy-Move Forgery Detection (CMFD) database and forged databases created from TIMIT and the Arabic Speech Corpus database. The experimental results show that the proposed method is significantly high robust against post-processing operations compared to other studies.