2024 47th International Conference on Telecommunications and Signal Processing (TSP), 10 Temmuz 2024, ss.298-301
Complex forgeries such as deepfakes, which are very popular today, as
well as very simple but effective manipulation techniques, the
production of which does not require the use of deep networks such as
GAN, are still practiced. Audio splicing forgery, which combines
multiple speech segments from different recordings of a person to alter
the content of a speech recording, is one of these manipulation
techniques and presents a great challenge to audio forgery. In this
paper, a novel audio splicing detection method based on ArCapsN et
architecture is proposed. The proposed method consists of two stages. In
the first stage, the audio file given as input is converted into a
cochleagram image. In the second stage, features are extracted from the
cochleagram images with EfficientN et and the ArCapsN et is trained with
these features. As a result of the training, the audio files given as a
test are labelled as forged/original. The proposed method is tested on
the database created by us using the TIMIT database. Our model gives the
baseline metrics precision, recall and Fl metrics by 98.59%, 97.99%,
and 98.29% on the dataset, respectively.