Manta: Multi-lingual advanced NMF-based topic analysis


KARAYAĞIZ E., BERBER T.

SoftwareX, cilt.32, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 32
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.softx.2025.102386
  • Dergi Adı: SoftwareX
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: Information retrieval, Natural language processing, Non-negative matrix factorization, Python, Topic modeling
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

This paper presents MANTA (Multi-lingual Advanced NMF-based Topic Analysis), a novel open-source Python library that provides an integrated pipeline to address key limitations in existing topic modeling workflows. MANTA provides an integrated, easy-to-use pipeline for Non-negative Matrix Factorization (NMF) based topic analysis, uniquely combining corpus-specific subword tokenization (BPE/WordPiece) with advanced term weighting schemes (SMART, BM25) and flexible NMF solver options, including a high-performance Projective NMF method. It offers native support for both English and morphologically complex languages like Turkish. With a simple one-function interface and a command-line utility, MANTA lowers the technical barrier for sophisticated topic analysis, making it a powerful tool for researchers in computational social science and digital humanities.