AI-generated diet and exercise recommendations for cardiovascular health compared to established cardiology society guidelines
Recommended Citation
Nduka TC, Ndakotsu A, Nriagu VC, et al. AI-Generated Diet and Exercise Recommendations for Cardiovascular Health Compared to Established Cardiology Society Guidelines. Cureus. 2025;17(8):e90968. Published 2025 Aug 25. doi:10.7759/cureus.90968
Abstract
Background and aim: As numerous individuals turn to the internet for initial guidance, health literacy is crucial for establishing sound health practices within the broader society. Although diet and exercise are crucial for prevention, artificial intelligence (AI) models can offer accurate health information; cardiovascular disease (CVD) continues to be a significant source of morbidity and mortality. The increasing application of AI in healthcare necessitates a thorough evaluation of the efficacy of large language models (LLMs) in providing dependable health recommendations. This study aimed to assess the appropriateness, biases, and clinical relevance of diet and exercise recommendations produced by four prominent language models (ChatGPT {San Francisco, CA: OpenAI}, Claude AI {San Francisco, CA: Anthropic}, DeepSeek AI {Hangzhou, China: DeepSeek}, Google Gemini {Google LLC: Mountain View, CA}) in relation to established cardiovascular disease association guidelines from the American Heart Association/American College of Cardiology (AHA/ACC) and the European Society of Cardiology (ESC).
Methods: A cross-sectional study was conducted using 15 standardized questions (five on physical activity, 10 on diet) evaluated by a primary care physician and a cardiology fellow. A cardiologist reviewed discrepancies in the evaluations by the two examiners; in such instances, the final grade was established by the median of the grades assigned by all three examiners. Based on compliance with AHA/ACC and ESC guidelines, responses were rated as appropriate, appropriate but insufficient, partially inappropriate, or entirely inappropriate.
Results: Ninety percent of responses from ChatGPT, Claude AI, and DeepSeek AI met established cardiovascular health standards, indicating superior performance among the language models. All five recommendations for physical activity were deemed appropriate. Google Gemini had a performance level of 80%, while 90% of the outcomes from the three LLMs were suitable for nutritional guidance. Particularly concerning carbohydrate and added sugar intake, all models struggled to provide precise quantitative guidance. Exercise recommendations indicated a slight preference for AHA/ACC guidelines.
Conclusions: While LLMs demonstrate potential for accessible health information sources, they cannot replace expert medical advice. This study highlights the need for continued medical professional interpretation and tailored healthcare guidance. Future advances should concentrate on raising the specificity of health recommendations and guaranteeing a fairer interpretation of international guidelines.
Type
Article
PubMed ID
41001283
Affiliations
Advocate Illinois Masonic Medical Center