Radiologie, 2026 (SCI-Expanded, Scopus)
Objectives: To evaluate the performance of large language models (LLMs), including retrieval-augmented generation (RAG)-based approaches, in extracting components and management recommendations from structured coronary computed tomography angiography (CCTA) reports according to the Coronary Artery Disease Reporting and Data System (CAD-RADS 2.0). Materials and methods: A total of 320 fully structured CCTA reports were analyzed using LLM. Closed-source standard ChatGPT‑5, NotebookLM (RAG-based model), and a RAG-adapted ChatGPT‑5 model (ChatGPT-5-RAG) were used. Each model extracted the CAD-RADS category, plaque burden, presence of high-risk plaque (HRP), other modifiers, full score, and management recommendations in accordance with the CAD-RADS 2.0 guidelines. We compared LLM outputs with reference standards determined by two expert cardiovascular radiologists. Results: ChatGPT-5-RAG showed the highest accuracy for CAD-RADS classification (0.959, 95% CI: 0.932–0.976), plaque burden (0.912, 95% CI: 0.876–0.939), HRP detection (0.988, 95% CI: 0.968–0.995), other modifiers (0.950, 95% CI: 0.920–0.969), and full score (0.828, 95% CI: 0.783–0.866). Closed-source ChatGPT‑5 showed the weakest performance across all components. Significant statistical differences were found among the three models (p < 0.001). Management recommendations were qualitatively rated on a three-point Likert scale; although agreement between models was low, ChatGPT-5-RAG and NotebookLM performed almost perfectly (median 3 points). Conclusion: This study demonstrates that RAG-enhanced LLMs significantly improve accuracy and reliability in extracting CAD-RADS 2.0 components and generating clinical management recommendations. The findings highlight the potential of RAG-based LLMs as innovative, explainable tools for automated and standardized CCTA reporting in clinical radiology workflows.