Assessing the precision of artificial intelligence in emergency department triage decisions: Insights from a study with ChatGPT

PASLI, SİNAN; Şahin, ABDUL; Beşer, Muhammet; Topçuoğlu, Hazal; Yadigaroğlu, Metin; İMAMOĞLU, MELİH

doi:10.1016/j.ajem.2024.01.037

Assessing the precision of artificial intelligence in emergency department triage decisions: Insights from a study with ChatGPT

PASLI S., Şahin A. S., Beşer M. F., Topçuoğlu H., Yadigaroğlu M., İMAMOĞLU M.

American Journal of Emergency Medicine, cilt.78, ss.170-175, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 78
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.ajem.2024.01.037
Dergi Adı: American Journal of Emergency Medicine
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Biotechnology Research Abstracts, CAB Abstracts, CINAHL, EMBASE, MEDLINE, Veterinary Science Database
Sayfa Sayıları: ss.170-175
Anahtar Kelimeler: Artificial intelligence, Chatbot, ChatGPT, Emergency department, Triage
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Background: The rise in emergency department presentations globally poses challenges for efficient patient management. To address this, various strategies aim to expedite patient management. Artificial intelligence's (AI) consistent performance and rapid data interpretation extend its healthcare applications, especially in emergencies. The introduction of a robust AI tool like ChatGPT, based on GPT-4 developed by OpenAI, can benefit patients and healthcare professionals by improving the speed and accuracy of resource allocation. This study examines ChatGPT's capability to predict triage outcomes based on local emergency department rules. Methods: This study is a single-center prospective observational study. The study population consists of all patients who presented to the emergency department with any symptoms and agreed to participate. The study was conducted on three non-consecutive days for a total of 72 h. Patients' chief complaints, vital parameters, medical history and the area to which they were directed by the triage team in the emergency department were recorded. Concurrently, an emergency medicine physician inputted the same data into previously trained GPT-4, according to local rules. According to this data, the triage decisions made by GPT-4 were recorded. In the same process, an emergency medicine specialist determined where the patient should be directed based on the data collected, and this decision was considered the gold standard. Accuracy rates and reliability for directing patients to specific areas by the triage team and GPT-4 were evaluated using Cohen's kappa test. Furthermore, the accuracy of the patient triage process performed by the triage team and GPT-4 was assessed by receiver operating characteristic (ROC) analysis. Statistical analysis considered a value of p < 0.05 as significant. Results: The study was carried out on 758 patients. Among the participants, 416 (54.9%) were male and 342 (45.1%) were female. Evaluating the primary endpoints of our study - the agreement between the decisions of the triage team, GPT-4 decisions in emergency department triage, and the gold standard - we observed almost perfect agreement both between the triage team and the gold standard and between GPT-4 and the gold standard (Cohen's Kappa 0.893 and 0.899, respectively; p < 0.001 for each). Conclusion: Our findings suggest GPT-4 possess outstanding predictive skills in triaging patients in an emergency setting. GPT-4 can serve as an effective tool to support the triage process.