A comparison of textual data mining methods for sex identification in chat conversations


KÖSE C. , ÖZYURT Ö. , Ikibas C.

4th Asia Information Retrieval Symposium, Harbin, Çin, 15 - 18 Ocak 2008, cilt.4993, ss.638-643 identifier identifier

  • Cilt numarası: 4993
  • Doi Numarası: 10.1007/978-3-540-68636-1_76
  • Basıldığı Şehir: Harbin
  • Basıldığı Ülke: Çin
  • Sayfa Sayıları: ss.638-643

Özet

Mining textual data in chat mediums is becoming more important because these mediums contain a vast amount of information, which is potentially relevant to a society's current interests, habits, social behaviors, crime tendency and other tendencies. Here, sex identification is taken as a base study in information mining in chat mediums. In order to do this, a simple discrimination function and semantic analysis method are proposed for sex identification in Turkish chat mediums. Then, the proposed sex identification method is compared with the Support Vector Machine (SVM) and Naive Bayes (NB) methods. Finally, results show that the proposed system has achieved accuracy over 90% in sex identification.