ChatGPT fails at diagnosing child medical cases. It's wrong 83 percent of the time.

A new study suggests ChatGPT needs to go back to medical school.
By Chase DiBenedetto  on 
An illustrated magnifying glass hovers over the ChatGPT logo.
ChatGPT fails at specialized medical diagnoses. Don't ditch those physicians yet. Credit: Bob Al-Greene / Mashable

OpenAI's ChatGPT is no closer to replacing your family physicians, as the increasingly advanced chatbot failed to accurately diagnose the vast majority of hypothetical pediatric cases.

The findings were part of a new study published in JAMA Pediatrics on Jan. 2, conducted by researchers from Cohen Children's Medical Center in New York. The researchers analyzed the bot's responses to requests for medical diagnosis of child illnesses and found that the bot had an 83 percent error rate across tests.

The study used what are known as pediatric case challenges, or medical cases originally posted to groups of physicians as learning opportunities (or diagnostic challenges) involving unusual or limited information. Researchers sampled 100 challenges published on JAMA Pediatrics and NEJM between the years 2013 and 2023.

ChatGPT provided incorrect diagnoses for 72 out of 100 of the experimental cases provided, and generated 11 answers that were deemed "clinically related" to the correct diagnosis but considered too broad to be correct.

The researchers attribute part of this failure to the generative AI's inability to recognize relationships between certain conditions and external or preexisting circumstances, often used to help diagnose patients in a clinical setting. For example, ChatGPT did not connect "neuropsychiatric conditions" (such as autism) to commonly seen cases of vitamin deficiency and other restrictive-diet-based conditions.

The study concludes that ChatGPT needs continued training and involvement of medical professionals that feeds the AI not with an internet-generated well of information, which can often cycle in misinformation, but on vetted medical literature and expertise.

AI-based chatbots relying on Large Language Models (LLMs) have been previously studied for their efficacy in diagnosing medical cases and in accomplishing the daily tasks of physicians. Last year, researchers tested generative AI's ability to pass the three-part United States Medical Licensing Exam — It passed.

But while it's still highly criticized for its training limits and potential to exacerbate medical bias, many medical groups, including the American Medical Association, don't view the advancement of AI in the field just as a threat of replacement. Instead, better trained AI's are considered ripe for their administrative and communicative potential, like generating patient-side text, explaining diagnoses in common terms, or in generating instructions. Clinical uses, like diagnostics, remain a controversial, and hard to research, topic.

To that extent, the new report represents the first analysis of a chatbot's diagnostic potential in a purely pediatric setting — acknowledging the specialized medical training undertaken by medical professionals. Its current limitations show that even the most advanced chatbot on the public market can't yet compete with the full range of human expertise.

Chase sits in front of a green framed window, wearing a cheetah print shirt and looking to her right. On the window's glass pane reads "Ricas's Tostadas" in red lettering.
Chase DiBenedetto
Social Good Reporter

Chase joined Mashable's Social Good team in 2020, covering online stories about digital activism, climate justice, accessibility, and media representation. Her work also touches on how these conversations manifest in politics, popular culture, and fandom. Sometimes she's very funny.


Recommended For You
Yes, ChatGPT got lazier. But OpenAI finally has a fix.


20 of the best ChatGPT courses you can take online for free

Fake Biden robocall creator suspended from AI voice startup

OpenAI is adding watermarks to ChatGPT images created with DALL-E 3

Trending on Mashable
NYT Connections today: See hints and answers for February 21

Wordle today: Here's the answer and hints for February 21

NYT Connections today: See hints and answers for February 20


How to try Sora, OpenAI's AI video generator
The biggest stories of the day delivered to your inbox.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
Thanks for signing up. See you at your inbox!