Advocate GME

Comparative analysis of ChatGPT responses of IVF-related questions and ASRM/ACOG answers

Recommended Citation

Wiley M, Hrvojevic K, Go A. Comparative analysis of ChatGPT responses of IVF-related questions and ASRM/ACOG answers. Poster presentation at: Pacific Coast Reproductive Society Annual Meeting; March 20, 2026; Palm Springs, CA.

Presentation Notes

Poster presentation at: Pacific Coast Reproductive Society Annual Meeting; March 20, 2026; Palm Springs, CA.

Abstract

Background: The use of artificial intelligence (AI) tools such as ChatGPT has rapidly expanded in healthcare, with many patients now utilizing these platforms to access medical information. Patients undergoing In vitro fertilization (IVF) often have multiple questions regarding treatment protocols, risks, and outcomes. While AI may enhance access, inaccurate or partial responses threaten patient care and proper counseling from healthcare providers. The American Society for Reproductive Medicine (ASRM) publishes detailed guidelines and committee opinions on nearly every aspect of assisted reproductive technology. Similarly, American College of Obstetrics and Gynecologists (ACOG) provides guidance for clinical practice and patient education in reproductive medicine. Comparing ChatGPT’s responses to information from reputable sources is therefore an essential first step in evaluating whether AI can provide safe, accurate, and comprehensible patient information in the fertility setting.

Objective: To evaluate the accuracy and readability of ChatGPT responses to common IVF-related patient questions compared with recommendations from ASRM and ACOG.

Materials and Methods: Twenty-two patient-style IVF questions were developed from online forums and education materials. Each question was entered into ChatGPT (GPT-5), and responses were recorded verbatim. Two reviewers evaluated responses for accuracy, with a senior author to resolve any discrepancies. The reviewers compared the ChatGPT answers found in ASRM and/or ACOG sources using a 0-2 scale, with 0 indicating an incorrect answer, 1 indicating a partially correct answer, and 2 indicating correct answer – similar to scale used by Shirefraw et al (2024) when assessing ChatGPT’s answers to healthcare questions. Readability was assessed using the Flesch-Kincaid Grading system, a validated tool that measures reading difficulty.

Result(s): The mean accuracy score across all responses was 1.72/2, reflecting frequent alignment but incomplete concordance with ASRM and/or ACOG guidelines. The inter-reliabilty was assessed with Cohen’s kappa and was 0.57 which indicates moderate agreement between reviewers. Readability averaged at a 10.2 grade level, suggesting responses may be difficult for patients with lower health literacy to fully comprehend.

Conclusion(s): ChatGPT often provided accurate but at times incomplete IVF information, with readability levels exceeding recommended health communication standards. These findings highlight the need for clinician oversight and for integration of guideline-based content into AI health communication platforms [5,7,13].Financial Support: No financial support was received for this study.

Type

Poster

Download

COinS

Advocate GME

Comparative analysis of ChatGPT responses of IVF-related questions and ASRM/ACOG answers

Recommended Citation

Presentation Notes

Abstract

Type

Search

Explore

Contribute

Links

Advocate GME

Comparative analysis of ChatGPT responses of IVF-related questions and ASRM/ACOG answers

Authors

Recommended Citation

Presentation Notes

Abstract

Type

Share

Search

Explore

Contribute

Links