Detecting Latent Topics and Trends in Software Engineering Research Since 1980 Using Probabilistic Topic Modeling


GÜRCAN F., Dalveren G. G. M., Cagiltay N. E., Soylu A.

IEEE ACCESS, cilt.10, ss.74638-74654, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 10
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1109/access.2022.3190632
  • Dergi Adı: IEEE ACCESS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.74638-74654
  • Anahtar Kelimeler: Market research, Systematics, Software engineering, Software, Bibliometrics, Text mining, Licenses, Corpus creation, research trends and topics, software engineering, text mining, topic model, SYSTEMATIC LITERATURE-REVIEWS, CITED ARTICLES, EMPIRICAL-RESEARCH, JOURNALS, SCHOLARS
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

The landscape of software engineering research has changed significantly from one year to the next in line with industrial needs and trends. Therefore, today's research literature on software engineering has a rich and multidisciplinary content that includes a large number of studies; however, not many of them demonstrate a holistic view of the field. From this perspective, this study aimed to reveal a holistic view that reflects topics, trends, and trajectories in software engineering research by analyzing the majority of domain-specific articles published over the last 40 years. This study first presents an objective and systematic method for corpus creation through major publication sources in the field. A corpus was then created using this method, which includes 44 domain-specific conferences and journals and 57,174 articles published between 1980 and 2019. Next, this corpus was analyzed using an automated text-mining methodology based on a probabilistic topic-modeling approach. As a result of this analysis, 24 main topics were found. In addition, topical trends in the field were revealed. Finally, three main developmental stages of the field were identified as: the programming age, the software development age, and the software optimization age.