
ABSTRACT
Objectives
Large Language Models (LLM) like ChatGPT and Gemini have potential in nutrition applications, but recent studies suggest they provide inaccurate dietary advice. The aim of this study was to evaluate the most commonly used LLMs, ChatGPT and Gemini, for dietary recommendations for patients with irritable bowel syndrome (IBS).
Methods
Various tools were used to assess the responses of LLMs in this study. The Guideline Compliance Score was created using IBS guidelines. The quality of the responses provided by LLMs was assessed using The Global Quality Score (GQS) and Completeness, Lack of Misinformation, Evidence, Appropriateness, Relevance (CLEAR) tool. Understandability and actionability were assessed using the Patient Education Materials Assessment Tool (PEMAT). The readability of ChatGPT and Gemini’s responses was evaluated using Flesch Reading Ease (FRE) and Flesch Kincaid Grade Level (FKGL).
Results
This study found that most responses from ChatGPT (70%) and Gemini (57.5%) were compliant with the guidelines, but there was no significant difference in guideline compliance, quality, understandability, actionability, or readability scores (p > 0.05). The CLEAR tool showed a moderate positive correlation with PEMAT actionability (r = 0.467, p = 0.038) and understandability (r = 0.568, p = 0.009), a strong positive correlation with GQS (r = 0.611, p = 0.004). In addition, FRE and FKGL had a strong negative correlation (r = −0.784, p < 0.001), while the Guideline Compliance Score showed a moderate negative correlation with FRE (r = −0.537, p = 0.015).
Conclusions
The study emphasizes the need for further model improvements before relying solely on LLMs in clinical nutrition practice, emphasizing the importance of dietitians’ recommendations and the collaboration between AI models and healthcare teams.
Journal of Human Nutrition and Dietetics, Volume 38, Issue 6, December 2025. Read More
