Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi, cilt.29, sa.1, ss.353-402, 2024 (Hakemli Dergi)
Besides
facilitating access to audio content on the Internet, developments in deep
learning methods have made it possible to produce deep fake audio. Automatic
Speaker Verification systems considered a security step to authenticate the
speaker, are vulnerable to deep spoofing attacks. It is crucial for today's age
that expert systems can detect such frauds. Deep fake audio spoofing is carried
out to produce audio files in the content by cloning the speaker's voice that
is planned to be changed as if he said something he did not say. Various
methods are proposed in the literature to detect this type of forgery. There
are free-access datasets used in performance evaluation in studies in the
literature, and it is possible to use them in result comparison. The planned
research aims to reduce or eliminate the noise that may exist in the audio file
of the system by passing the preprocessing stage of the audio signal received
as input. This paper examines the methods and datasets in the literature, and
the advantages and disadvantages of the methods on these datasets are
emphasized.