Recommended Citation

Foy O, Lafazanos Y, Berber L, Ehrenpreis ED . Using Large Language Models to Evaluate the Inflammatory Bowel Disease Questionnaire (IBDQ). Presented at Scientific Day; May 20, 2026; Milwaukee, WI.

Affiliations

Advocate Lutheran General Hospital

Abstract

Background/Significance:

The Inflammatory Bowel Disease Questionnaire (IBDQ), a 32-item survey published in 1989, evaluates health-related quality of life (QOL) in inflammatory bowel disease (IBD) across bowel, systemic, emotional, and social domains. Higher scores indicate better QOL. It remains widely used in clinical practice and trials. However, as therapies and patient expectations evolve, legacy instruments may not capture contemporary domains such as mental health burden, cultural factors, treatment complexity, extraintestinal manifestations, and cumulative therapy impact. Large language models (LLMs), increasingly used in clinical research, offer a systematic method to critique and refine patientreported outcome (PRO) tools. We aimed to determine whether LLMs identify meaningful gaps in the IBDQ and generate clinically relevant revisions compared with the original instrument.

Purpose:

To evaluate the IBDQ using multiple AI platforms, identify missing or underrepresented domains, generate revisions, and compare outputs through blinded expert review.

Methods:

The complete IBDQ and standardized prompts were provided to seven LLMs. Each generated a 1–10 global quality rating and structured critiques outlining strengths, limitations, redundancies, and proposed missing domains. An IBD-focused academic physician reviewed seven coded, randomized outputs under blind conditions. Overall quality, clinical appropriateness, and psychometric insight were assessed. Two 1–5 scales rated clinical applicability and comparative improvement versus the original IBDQ. Content analysis identified domains most frequently cited as underrepresented.

Results:

All LLMs rated the IBDQ favorably (7–8/10). Open Evidence retained the original structure. Copilot and Gemini reduced redundancy by consolidating items. ChatGPT expanded social consequence domains. Claude and Perplexity emphasized dietary impact, treatment burden, body image, cognitive function, and extraintestinal manifestations. Expert review found Copilot, Gemini, and Open Evidence revisions useful but inferior to the original. ChatGPT 4o and 5.1 revisions were considered improvements. Claude received the highest evaluation, with revisions viewed as potentially practice-changing.

Conclusion:

LLMs identified domains not fully captured in the IBDQ and proposed meaningful refinements. While outputs varied, many emphasized reducing questionnaire burden and improving clarity. LLMs may serve as scalable tools to modernize legacy PRO instruments, with expert oversight to ensure clinical rigor and relevance.

Presentation Notes

Presented at Scientific Day; May 20, 2026; Milwaukee, WI.

Full Text of Presentation

wf_yes

Document Type

Poster

Download

Open Access

Available to all.

COinS

May 20th, 12:00 AM

Using Large Language Models to Evaluate the Inflammatory Bowel Disease Questionnaire (IBDQ)

Background/Significance:

Purpose:

To evaluate the IBDQ using multiple AI platforms, identify missing or underrepresented domains, generate revisions, and compare outputs through blinded expert review.

Methods:

Results:

Conclusion:

General Posters

Using Large Language Models to Evaluate the Inflammatory Bowel Disease Questionnaire (IBDQ)

Recommended Citation

Affiliations

Abstract

Presentation Notes

Full Text of Presentation

Document Type

Open Access

Search

Explore

Contribute

Links

General Posters

Using Large Language Models to Evaluate the Inflammatory Bowel Disease Questionnaire (IBDQ)

Presenter/Author Information

Recommended Citation

Affiliations

Abstract

Presentation Notes

Full Text of Presentation

Document Type

Open Access

Share

Search

Explore

Contribute

Links