Confidence scoring for deep learning-predicted antibody–antigen complexes: AntiConf as a precision-driven metric


Ünsal S., Holland B., Sardag I., Timucin E.

Briefings in Bioinformatics, vol.27, no.2, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 27 Issue: 2
  • Publication Date: 2026
  • Doi Number: 10.1093/bib/bbag137
  • Journal Name: Briefings in Bioinformatics
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Library, Information Science & Technology Abstracts (LISTA), MEDLINE, Directory of Open Access Journals
  • Keywords: AF3-based implementations, Alphafold2, antibody–antigen complexes, model confidence scores, multimer prediction
  • Karadeniz Technical University Affiliated: Yes

Abstract

Abstract Accurate determination of antibody–antigen (Ab–Ag) complex structures is critical for therapeutic development. While deep learning-based methods, beginning with AlphaFold2 (AF2), have revolutionized multimer predictions, the optimal strategies for Ab–Ag modeling, and the reliability of their confidence scores remain active areas of research. This study evaluates the performance of AF2, Boltz-1, Boltz-1x, Boltz-2, Chai-1, Protenix, Protenix-1, OpenFold3, and ESMFold, on a curated dataset of 200 Ab–Ag complexes. Among the nine methods tested, Protenix-1 emerged as the top performer, with Chai-1 consistently ranking second across multiple success metrics, closely followed by AF2. We observed diverse effects of recycling iterations, with AF2, Chai-1, and Protenix variants benefiting from increased cycles, unlike Boltz variants. We analyzed various model confidence scores, noting high precision from pDockQ2 and high recall from predicted Template-Modeling (pTM) score. By integrating these two scores, we developed antibody confidence (AntiConf), a novel metric that achieves superior performance for all methods in terms of precision and recall. These strengths make AntiConf a valuable post score for both computational predictions and downstream experimental workflows, reflecting its potential to improve Ab–Ag complex predictions by AF2 and AF3 architectures. Altogether, this study addresses current limitations in deep learning-based Ab–Ag complex prediction, showcasing the potential of AntiConf for future assessment studies, and providing a guideline for improving the accuracy of Ab–Ag complex prediction.