Manta: Multi-lingual advanced NMF-based topic analysis


KARAYAĞIZ E., BERBER T.

SoftwareX, vol.32, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Publication Type: Article / Article
  • Volume: 32
  • Publication Date: 2025
  • Doi Number: 10.1016/j.softx.2025.102386
  • Journal Name: SoftwareX
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Keywords: Information retrieval, Natural language processing, Non-negative matrix factorization, Python, Topic modeling
  • Karadeniz Technical University Affiliated: Yes

Abstract

This paper presents MANTA (Multi-lingual Advanced NMF-based Topic Analysis), a novel open-source Python library that provides an integrated pipeline to address key limitations in existing topic modeling workflows. MANTA provides an integrated, easy-to-use pipeline for Non-negative Matrix Factorization (NMF) based topic analysis, uniquely combining corpus-specific subword tokenization (BPE/WordPiece) with advanced term weighting schemes (SMART, BM25) and flexible NMF solver options, including a high-performance Projective NMF method. It offers native support for both English and morphologically complex languages like Turkish. With a simple one-function interface and a command-line utility, MANTA lowers the technical barrier for sophisticated topic analysis, making it a powerful tool for researchers in computational social science and digital humanities.