Exploratory Analysis of Topic Interests and Their Evolution in Bioinformatics Research Using Semantic Text Mining and Probabilistic Topic Modeling

Creative Commons License

GÜRCAN F., Cagiltay N. E.

IEEE ACCESS, vol.10, pp.31480-31493, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 10
  • Publication Date: 2022
  • Doi Number: 10.1109/access.2022.3160795
  • Journal Name: IEEE ACCESS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Page Numbers: pp.31480-31493
  • Keywords: Bioinformatics, Market research, Biology, Analytical models, Genomics, Proteins, Computational modeling, Bioinformatics corpus, probabilistic topic modeling, textual content analysis, scientometric analysis, bioinformatics topics and trends, TRENDS, FIELD, DYNAMICS, IMPACT, LDA
  • Karadeniz Technical University Affiliated: Yes


Bioinformatics, which has developed rapidly in recent years with the collaborative contributions of the fields of biology and informatics, provides a deeper perspective on the analysis and understanding of complex biological data. In this regard, bioinformatics has an interdisciplinary background and a rich literature in terms of domain-specific studies. Providing a holistic picture of bioinformatics research by analyzing the major topics and their trends and developmental stages is critical for an understanding of the field. From this perspective, this study aimed to analyze the last 50 years of bioinformatics studies (a total of 71,490 articles) by using an automated text-mining methodology based on probabilistic topic modeling to reveal the main topics, trends, and the evolution of the field. As a result, 24 major topics that reflect the focuses and trends of the field were identified. Based on the discovered topics and their temporal tendencies from 1970 until 2020, the developmental periods of the field were divided into seven phases, from the "newborn" to the "wisdom" stages. Moreover, the findings indicated a recent increase in the popularity of the topics "Statistical Estimation", "Data Analysis Tools", "Genomic Data", "Gene Expression", and "Prediction". The results of the study revealed that, in bioinformatics studies, interest in innovative computing and data analysis methods based on artificial intelligence and machine learning has gradually increased, thereby marking a significant improvement in contemporary analysis tools and techniques based on prediction.