Accuracy and Completeness of Large Language Models about Antibody-Drug Conjugates and Associated Ocular Adverse Effects Journal Article


Authors: Marshall, R.; Xu, H.; Dalvin, L. A.; Mishra, K.; Edalat, C.; Kirupaharan, N.; Francis, J. H.; Berkenstock, M.
Article Title: Accuracy and Completeness of Large Language Models about Antibody-Drug Conjugates and Associated Ocular Adverse Effects
Abstract: Purpose:The purpose of this study was to assess the accuracy and completeness of 3 large language models (LLMs) to generate information about antibody-drug conjugate (ADC)-associated ocular toxicities.Methods:There were 22 questions about ADCs, tisotumab vedotin, and mirvetuximab soravtansine that were developed and input into ChatGPT 4.0, Bard, and LLaMa. Answers were rated by 4 ocular toxicity experts using standardized 6-point Likert scales on accuracy and completeness. ANOVA tests were conducted for comparison between the 3 subgroups, followed by pairwise t-tests. Interrater variability was assessed with Fleiss kappa tests.Results:The mean accuracy score was 4.62 (SD 0.89) for ChatGPT, 4.77 (SD 0.90) for Bard, and 4.41 (SD 1.09) for LLaMA. Both ChatGPT (P = 0.03) and Bard (P = 0.003) scored significantly better for accuracy when compared with LLaMA. The mean completeness score was 4.43 (SD 0.91) for ChatGPT, 4.57 (SD 0.93) for Bard, and 4.42 (SD 0.99) for LLaMA. There were no significant differences in completeness scores between groups. Fleiss kappa assessment for interrater variability was good (0.74) for accuracy and fair (0.31) for completeness.Conclusions:All 3 LLMs had relatively high accuracy and completeness ratings, showing LLMs are able to provide sufficient answers for niche topics of ophthalmology. Our results indicate that ChatGPT and Bard may be slightly better at providing more accurate answers than LLaMA. As further research and treatment plans are developed for ADC-associated ocular toxicities, these LLMs should be reassessed to see if they provide complete and accurate answers that remain in line with current medical knowledge. Copyright © 2024 Wolters Kluwer Health, Inc. All rights reserved.
Keywords: adult; aged; major clinical study; pathogenesis; comparative study; medical information; drug mechanism; artificial intelligence; eye disease; adverse drug reaction; eye toxicity; maytansine; eye diseases; antibody conjugate; immunoconjugates; humans; human; article; mirvetuximab soravtansine; data accuracy; antibody drug conjugates; data completeness; tisotumab vedotin; large language model; large language models; ocular adverse effects
Journal Title: Cornea
Volume: 44
Issue: 7
ISSN: 02773740
Publisher: Unknown  
Date Published: 2025-01-01
Start Page: 851
End Page: 855
Language: English
DOI: 10.1097/ico.0000000000003664
PUBMED: 39110155
PROVIDER: scopus
DOI/URL:
Notes: Article -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics