Javascript must be enabled to continue!

P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Abstract Study question What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fertility questions? Summary answer ChatGPT 4.0 responses were rated highly by specialists for accuracy, clarity, and relevance, while readability is slightly difficult. What is known already Infertility is a challenging condition that leads many patients to seek online information about treatments and outcomes. Chatbots like ChatGPT can provide accessible guidance, but their reliability in specialized fields like fertility remains uncertain. Although studies on chatbot quality exist, key aspects—accuracy, clarity, and relevance—remain underexplored. Furthermore, no research has evaluated ChatGPT’s performance in Latin America, where cultural and linguistic nuances may impact its effectiveness. Assessing ChatGPT’s potential in fertility counseling could enhance patient communication and healthcare delivery. Study design, size, duration This cross-sectional study analyzed ChatGPT 4.0’s responses to 50 common fertility-related questions. Responses were evaluated by 10 fertility specialists (5 senior, 5 junior) using the Global Quality Scale (GQS) for a general idea, and Likert scales for accuracy, clarity, and relevance. To measure readability, we analyzed the answers with the Spanish Flesch-Kincaid score. Domain-specific differences were statistically analyzed, providing insights into ChatGPT’s ability to support patient counseling. Participants/materials, setting, methods Fifty frequently asked fertility-related questions were selected from patient forums and blogs. ChatGPT 4.0´s responses, generated using prompts requesting answers 'as if ChatGPT were an infertility specialist, using the best available evidence' were evaluated by 10 fertility specialists. Ratings for accuracy, clarity, relevance, and readability were collected using structured tools. Data were analyzed to determine performance differences across domains. Main results and the role of chance All specialists rated ChatGPT as at least “good” in answering fertility-related questions, with 62% (7/10) classifying its general performance as “very good” or “excellent.” In agreement, in the GQS scale (1–5), 44% (47/50) of responses scored ≥3, with an overall mean of 3.6 ± 0.6, reflecting answers generally rated as very good or excellent. All specialists (10/10) agreed ChatGPT could complement specialist counseling. Ratings across domains were consistently above 3, except for infertility diagnosis precision. Domain averages (1–5) showed no significant differences based on specialists' experience (p=NS). Except for infertility diagnosis, which was rated as good, other domains were consistently rated as very good or excellent (mean scores were 4.2 ± 0.4): • Infertility diagnosis: Precision 3.7, clarity 3.9, relevance 3.9. • Treatment: Precision 4.0, clarity 4.1, relevance 4.2. • Medication: Precision 4.0, clarity 4.0, relevance 4.0 • Prognosis: Precision 4.2, clarity 4.2, relevance 4.2 • Lifestyle: Precision 4.2, clarity 4.2, relevance 4.3. • Time-related questions: Precision 3.9, clarity 4.1, relevance 4.2. • Emotional support: Precision 4.4, clarity 4.4, relevance 4.5. • ART technologies: Precision 4.0, clarity 4.2, relevance 4.3. Readability was scored at 19.6 ± 4.2, corresponding to a 10th-grade reading level, slightly difficult but accessible. Limitations, reasons for caution The main limitations include the lack of generalizability to other AI-based chatbots and the dependency on prompt design, which significantly impacts the quality of responses. Additionally, findings may not apply to more controversial topics, highlighting the need for further research in diverse contexts. Wider implications of the findings The findings suggest that ChatGPT 4.0 could serve as a valuable tool for initial infertility counseling, providing consistently high ratings for precision, clarity, and relevance. It may complement consultations by offering background information while the doctor validates the content and provides the essential value of human interaction in patient care. Trial registration number No

Oxford University Press (OUP)

M Schapira F Di Biase C Formica Muntaner M Montiveros D Glujovsky

Human Reproduction

2025

Title: P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Description:

Abstract Study question What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fertility questions? Summary answer ChatGPT 4.

0 responses were rated highly by specialists for accuracy, clarity, and relevance, while readability is slightly difficult.

What is known already Infertility is a challenging condition that leads many patients to seek online information about treatments and outcomes.

Chatbots like ChatGPT can provide accessible guidance, but their reliability in specialized fields like fertility remains uncertain.

Although studies on chatbot quality exist, key aspects—accuracy, clarity, and relevance—remain underexplored.

Furthermore, no research has evaluated ChatGPT’s performance in Latin America, where cultural and linguistic nuances may impact its effectiveness.

Assessing ChatGPT’s potential in fertility counseling could enhance patient communication and healthcare delivery.

Study design, size, duration This cross-sectional study analyzed ChatGPT 4.

0’s responses to 50 common fertility-related questions.

Responses were evaluated by 10 fertility specialists (5 senior, 5 junior) using the Global Quality Scale (GQS) for a general idea, and Likert scales for accuracy, clarity, and relevance.

To measure readability, we analyzed the answers with the Spanish Flesch-Kincaid score.

Domain-specific differences were statistically analyzed, providing insights into ChatGPT’s ability to support patient counseling.

Participants/materials, setting, methods Fifty frequently asked fertility-related questions were selected from patient forums and blogs.

ChatGPT 4.

0´s responses, generated using prompts requesting answers 'as if ChatGPT were an infertility specialist, using the best available evidence' were evaluated by 10 fertility specialists.

Ratings for accuracy, clarity, relevance, and readability were collected using structured tools.

Data were analyzed to determine performance differences across domains.

Main results and the role of chance All specialists rated ChatGPT as at least “good” in answering fertility-related questions, with 62% (7/10) classifying its general performance as “very good” or “excellent.

” In agreement, in the GQS scale (1–5), 44% (47/50) of responses scored ≥3, with an overall mean of 3.

6 ± 0.

6, reflecting answers generally rated as very good or excellent.

All specialists (10/10) agreed ChatGPT could complement specialist counseling.

Ratings across domains were consistently above 3, except for infertility diagnosis precision.

Domain averages (1–5) showed no significant differences based on specialists' experience (p=NS).

Except for infertility diagnosis, which was rated as good, other domains were consistently rated as very good or excellent (mean scores were 4.

2 ± 0.

4): • Infertility diagnosis: Precision 3.

7, clarity 3.

9, relevance 3.

• Treatment: Precision 4.

0, clarity 4.

1, relevance 4.

• Medication: Precision 4.

0, clarity 4.

0, relevance 4.

0 • Prognosis: Precision 4.

2, clarity 4.

2, relevance 4.

2 • Lifestyle: Precision 4.

2, clarity 4.

2, relevance 4.

• Time-related questions: Precision 3.

9, clarity 4.

1, relevance 4.

• Emotional support: Precision 4.

4, clarity 4.

4, relevance 4.

• ART technologies: Precision 4.

0, clarity 4.

2, relevance 4.

Readability was scored at 19.

6 ± 4.

2, corresponding to a 10th-grade reading level, slightly difficult but accessible.

Limitations, reasons for caution The main limitations include the lack of generalizability to other AI-based chatbots and the dependency on prompt design, which significantly impacts the quality of responses.

Additionally, findings may not apply to more controversial topics, highlighting the need for further research in diverse contexts.

Wider implications of the findings The findings suggest that ChatGPT 4.

0 could serve as a valuable tool for initial infertility counseling, providing consistently high ratings for precision, clarity, and relevance.

It may complement consultations by offering background information while the doctor validates the content and provides the essential value of human interaction in patient care.

Trial registration number No.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study

Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...

Autonomy on Trial

Photo by CHUTTERSNAP on Unsplash Abstract This paper critically examines how US bioethics and health law conceptualize patient autonomy, contrasting the rights-based, individualist...

CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS

This research investigates the effect of ChatGPT on the learning outcomes of pre-service biology teachers. Sampling was done by purposive sampling in class A (treated with ChatGPT)...

Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports

Abstract To evaluate and compare the performance of publicly available ChatGPT-3.5, ChatGPT-4.0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound ...

Global Healthcare Professionals’ Perceptions of Large Language Model Use In Practice (Preprint)

BACKGROUND Chat Generative Pre-Trained Transformer (ChatGPTTM) is a large language model (LLM)-based chatbot developed by OpenAITM. ChatGPT has many potenti...

Appearance of ChatGPT and English Study

The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...

ChatGPT for tinnitus information and support: response accuracy and retest after three months (Preprint)

BACKGROUND ChatGPT – a conversational tool based on artificial intelligence – has recently been tested on a range of topics. However most of the testing has...

Email:
Password:

Email:

P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Related Results