IEEE ACCESS, cilt.12, ss.117523-117540, 2024 (SCI-Expanded)
Automatic Speaker Verification systems are prone to various voice spoofing attacks such as replays, voice conversion (VC) and speech synthesis. Malicious users can perform specific tasks such as controlling the bank account of someone, taking control of a smart home, and similar activities, by using advanced audio manipulation techniques. This study presents a Multi-Pattern Features Based Spoofing detection mechanism using the modified ResNet architecture and OC-Softmax layer to detect various LA and PA spoofing attacks. We proposed a novel Pattern features-based audio spoof detection scheme. The scheme contains three branches to evaluate different patterns on a Mel spectrogram of the audio file. This is the first work for the audio spoofing detection task using three different pattern representations of Mel spectrogram with modified ResNet architecture and OC-Softmax layer. Through the proposed network, we can extract pattern images from the Mel spectrogram and gives each of them into modified ResNet architecture. At the last step of each network, we use OC-Softmax to obtain a score for the current pattern image and then the method fuses three scores to label the input audio. Experimental results on the ASVspoof 2019 and ASVspoof 2021 corpuses show that the proposed method achieves better results in the challenges of ASVspoof 2019 than state-of-the-art methods. For example, in the logical access scenario, our model improves the tandem decision cost function and equal error rate scores by 0.06% and 2.14%, respectively, compared with state-of-the-art methods. Additionally, experiments illustrate that the proposed fused decision improved the performance of the system.