Briefings in Bioinformatics, vol.27, no.2, 2026 (SCI-Expanded, Scopus)
Abstract Accurate determination of antibody–antigen (Ab–Ag) complex structures is critical for therapeutic development. While deep learning-based methods, beginning with AlphaFold2 (AF2), have revolutionized multimer predictions, the optimal strategies for Ab–Ag modeling, and the reliability of their confidence scores remain active areas of research. This study evaluates the performance of AF2, Boltz-1, Boltz-1x, Boltz-2, Chai-1, Protenix, Protenix-1, OpenFold3, and ESMFold, on a curated dataset of 200 Ab–Ag complexes. Among the nine methods tested, Protenix-1 emerged as the top performer, with Chai-1 consistently ranking second across multiple success metrics, closely followed by AF2. We observed diverse effects of recycling iterations, with AF2, Chai-1, and Protenix variants benefiting from increased cycles, unlike Boltz variants. We analyzed various model confidence scores, noting high precision from pDockQ2 and high recall from predicted Template-Modeling (pTM) score. By integrating these two scores, we developed antibody confidence (AntiConf), a novel metric that achieves superior performance for all methods in terms of precision and recall. These strengths make AntiConf a valuable post score for both computational predictions and downstream experimental workflows, reflecting its potential to improve Ab–Ag complex predictions by AF2 and AF3 architectures. Altogether, this study addresses current limitations in deep learning-based Ab–Ag complex prediction, showcasing the potential of AntiConf for future assessment studies, and providing a guideline for improving the accuracy of Ab–Ag complex prediction.