Hong Kong Journal of Emergency Medicine, cilt.32, sa.4, 2025 (SCI-Expanded, Scopus)
Background: Timely decision-making is critical for managing toxicological exposures; however, many physicians lack toxicology-specific experience and resources. Objectives: This study aimed to assess whether GPT-4o, an artificial intelligence (AI)-based decision support system, could provide emergency medicine residents with information for managing toxicology cases and to analyze its potential effectiveness in guiding treatment decisions compared to toxicologists. Methods: We conducted a prospective observational study with 30 emergency medicine residents (16 junior residents [JRs] and 14 senior residents [SRs]) using GPT-4o. Each resident was assigned 30 clinical scenarios derived from real-life toxicological exposure cases. GPT-4o responded to the same 30 scenarios assigned to each resident. The responses from the residents and GPT-4o were compared to gold standard (GS) responses determined by a medical toxicologist. Cohen's kappa coefficient was used to evaluate the agreement between each group's responses and the GS. Results: GPT-4o and SRs showed a similar and good agreement with the GS for recommending antidotal treatment (GPT-4o: κ = 0.710, p < 0.001; SR: κ = 0.704, p < 0.001). This agreement was lower among JRs (κ = 0.451, p < 0.001). Regarding the recommendation for enhancing elimination treatment, GPT-4o demonstrated greater agreement with the GS (κ = 0.632, p < 0.001) than SRs (κ = 0.551, p < 0.001) and JRs (κ = 0.293, p < 0.001). Conclusions: Although ChatGPT-4o demonstrated good agreement with the medical toxicologist, the observed risk of incorrect recommendations suggests that AI-based systems should currently be considered as supportive tools rather than stand-alone decision makers. Expert toxicological oversight remains important, particularly for high-stakes clinical decisions.