Manta: Multi-lingual advanced NMF-based topic analysis
SoftwareX, cilt.32, 2025 (SCI-Expanded, Scopus)
- Yayın Türü: Makale / Tam Makale
- Cilt numarası: 32
- Basım Tarihi: 2025
- Doi Numarası: 10.1016/j.softx.2025.102386
- Dergi Adı: SoftwareX
- Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
- Anahtar Kelimeler: Information retrieval, Natural language processing, Non-negative matrix factorization, Python, Topic modeling
- Karadeniz Teknik Üniversitesi Adresli: Evet
Özet
This paper presents MANTA (Multi-lingual Advanced NMF-based Topic Analysis), a novel open-source Python library that provides an integrated pipeline to address key limitations in existing topic modeling workflows. MANTA provides an integrated, easy-to-use pipeline for Non-negative Matrix Factorization (NMF) based topic analysis, uniquely combining corpus-specific subword tokenization (BPE/WordPiece) with advanced term weighting schemes (SMART, BM25) and flexible NMF solver options, including a high-performance Projective NMF method. It offers native support for both English and morphologically complex languages like Turkish. With a simple one-function interface and a command-line utility, MANTA lowers the technical barrier for sophisticated topic analysis, making it a powerful tool for researchers in computational social science and digital humanities.