A comparison of textual data mining methods for sex identification in chat conversations


KÖSE C., ÖZYURT Ö., Ikibas C.

4th Asia Information Retrieval Symposium, Harbin, China, 15 - 18 January 2008, vol.4993, pp.638-643 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 4993
  • Doi Number: 10.1007/978-3-540-68636-1_76
  • City: Harbin
  • Country: China
  • Page Numbers: pp.638-643
  • Karadeniz Technical University Affiliated: Yes

Abstract

Mining textual data in chat mediums is becoming more important because these mediums contain a vast amount of information, which is potentially relevant to a society's current interests, habits, social behaviors, crime tendency and other tendencies. Here, sex identification is taken as a base study in information mining in chat mediums. In order to do this, a simple discrimination function and semantic analysis method are proposed for sex identification in Turkish chat mediums. Then, the proposed sex identification method is compared with the Support Vector Machine (SVM) and Naive Bayes (NB) methods. Finally, results show that the proposed system has achieved accuracy over 90% in sex identification.