Multi-Class Classification of Turkish Texts with Machine Learning Algorithms<bold> </bold>

GÜRCAN F.

2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Kizilcahamam, Türkiye, 19 - 21 Ekim 2018, ss.294-298, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Basıldığı Şehir: Kizilcahamam
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.294-298
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

The problem of text classification is the process of supervised assignment of text documents to one or more predefined categories or classes according to the content of the processed texts with natural language processing methods. Text classification applications are actively used in various fields such as categorization of social interactions, web pages and news texts, optimization of search engines, extracting information, and automatically processing e-mails. In this context, it is aimed to classify Turkish texts with methods based on supervised machine learning. In this context, the classification success of supervised learning models on Turkish texts was analyzed with different parameters. These models have been tested for classification of news texts on five predefined classes (economy, politics, sport, health, and technology) and the system was trained with different number of training documents and the classification process was carried out. In this context, the classification performances of Multinomial Naive Bayes, Bernoulli Naive Bayes, Support Vector Machine, K-Nearest Neighbor, and Decision Trees algorithms on Turkish news texts are compared and interpreted in the light of the results obtained with different parameters. As a result of the study, the procedure with the best classification success was the Multinomial Naive Bayes algorithm with a classification success of about 90%. These results show that the Naive Bayes probability model can be used as an effective classifier method in classifying Turkish texts compared to other methods. In this context, it is envisaged that the proposed methodology could be applied to Turkish texts on different web platforms (social networks, forums, communication networks, etc.) for different purposes.