American Sign Language Recognition Model Using Complex Zernike Moments and Complex-Valued Deep Neural Networks

Bayrak, SELDA; Nabiyev, VASİF; Atalar, CELAL

doi:10.1109/access.2024.3461572

American Sign Language Recognition Model Using Complex Zernike Moments and Complex-Valued Deep Neural Networks

Atıf İçin Kopyala

Bayrak S., Nabiyev V., Atalar C.

IEEE Access, cilt.12, ss.193001-193013, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 12
Basım Tarihi: 2024
Doi Numarası: 10.1109/access.2024.3461572
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.193001-193013
Anahtar Kelimeler: Complex valued deep neural network, complex Zernike moments, feature extraction, sign language recognition model
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Technological advancements play a significant role in the integration of deaf and mute individuals into society. Therefore, improvements in sign language recognition systems are of great importance. Many studies on sign languages have been conducted using real numbers. In this paper, a new approach is presented for performing feature extraction from images and sign language alphabet recognition using complex numbers. In this context, a model is developed for recognizing American sign language. In the developed model, complex Zernike moments are used to obtain the feature vector of character images. A complex-valued deep neural network (CVDNN) capable of processing the feature vector composed of complex numbers across layers is also developed. CVDNNs are a powerful method capable of addressing the complex optimization issues of traditional deep neural networks more efficiently. CVDNNs, which use complex numbers as input data and complex activation functions in each layer, are expected to deliver superior performance in fields such as robotic systems, biometric technologies, disease diagnosis, and telecommunications. The model achieves recognition rates of 89.01% on the Sign Language MNIST dataset and 98.67% for holdout and 81.22% for leave-one-subject-out on the Massey University dataset, respectively, without any preprocessing. Our model, which is compared separately with many studies using the same datasets, shows the best performance when the two datasets are considered together. It has been observed that working with complex numbers resulted in a positive impact on performance of approximately 20% compared to configuring our model to work with real numbers while keeping its structure intact.