Presentation Notes

Poster presentation at: Pacific Coast Reproductive Society Annual Meeting; March 20, 2026; Palm Springs, CA.

Abstract

Background: The use of artificial intelligence (AI) tools such as ChatGPT has rapidly expanded in healthcare, with many patients now utilizing these platforms to access medical information. Patients undergoing In vitro fertilization (IVF) often have multiple questions regarding treatment protocols, risks, and outcomes. While AI may enhance access, inaccurate or partial responses threaten patient care and proper counseling from healthcare providers. The American Society for Reproductive Medicine (ASRM) publishes detailed guidelines and committee opinions on nearly every aspect of assisted reproductive technology. Similarly, American College of Obstetrics and Gynecologists (ACOG) provides guidance for clinical practice and patient education in reproductive medicine. Comparing ChatGPT’s responses to information from reputable sources is therefore an essential first step in evaluating whether AI can provide safe, accurate, and comprehensible patient information in the fertility setting.

Objective: To evaluate the accuracy and readability of ChatGPT responses to common IVF-related patient questions compared with recommendations from ASRM and ACOG.

Materials and Methods: Twenty-two patient-style IVF questions were developed from online forums and education materials. Each question was entered into ChatGPT (GPT-5), and responses were recorded verbatim. Two reviewers evaluated responses for accuracy, with a senior author to resolve any discrepancies. The reviewers compared the ChatGPT answers found in ASRM and/or ACOG sources using a 0-2 scale, with 0 indicating an incorrect answer, 1 indicating a partially correct answer, and 2 indicating correct answer – similar to scale used by Shirefraw et al (2024) when assessing ChatGPT’s answers to healthcare questions. Readability was assessed using the Flesch-Kincaid Grading system, a validated tool that measures reading difficulty.

Result(s): The mean accuracy score across all responses was 1.72/2, reflecting frequent alignment but incomplete concordance with ASRM and/or ACOG guidelines. The inter-reliabilty was assessed with Cohen’s kappa and was 0.57 which indicates moderate agreement between reviewers. Readability averaged at a 10.2 grade level, suggesting responses may be difficult for patients with lower health literacy to fully comprehend.

Conclusion(s): ChatGPT often provided accurate but at times incomplete IVF information, with readability levels exceeding recommended health communication standards. These findings highlight the need for clinician oversight and for integration of guideline-based content into AI health communication platforms [5,7,13].Financial Support: No financial support was received for this study.

Type

Poster


 

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.