Javascript must be enabled to continue!

P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Abstract Study question What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fertility questions? Summary answer ChatGPT 4.0 responses were rated highly by specialists for accuracy, clarity, and relevance, while readability is slightly difficult. What is known already Infertility is a challenging condition that leads many patients to seek online information about treatments and outcomes. Chatbots like ChatGPT can provide accessible guidance, but their reliability in specialized fields like fertility remains uncertain. Although studies on chatbot quality exist, key aspects—accuracy, clarity, and relevance—remain underexplored. Furthermore, no research has evaluated ChatGPT’s performance in Latin America, where cultural and linguistic nuances may impact its effectiveness. Assessing ChatGPT’s potential in fertility counseling could enhance patient communication and healthcare delivery. Study design, size, duration This cross-sectional study analyzed ChatGPT 4.0’s responses to 50 common fertility-related questions. Responses were evaluated by 10 fertility specialists (5 senior, 5 junior) using the Global Quality Scale (GQS) for a general idea, and Likert scales for accuracy, clarity, and relevance. To measure readability, we analyzed the answers with the Spanish Flesch-Kincaid score. Domain-specific differences were statistically analyzed, providing insights into ChatGPT’s ability to support patient counseling. Participants/materials, setting, methods Fifty frequently asked fertility-related questions were selected from patient forums and blogs. ChatGPT 4.0´s responses, generated using prompts requesting answers 'as if ChatGPT were an infertility specialist, using the best available evidence' were evaluated by 10 fertility specialists. Ratings for accuracy, clarity, relevance, and readability were collected using structured tools. Data were analyzed to determine performance differences across domains. Main results and the role of chance All specialists rated ChatGPT as at least “good” in answering fertility-related questions, with 62% (7/10) classifying its general performance as “very good” or “excellent.” In agreement, in the GQS scale (1–5), 44% (47/50) of responses scored ≥3, with an overall mean of 3.6 ± 0.6, reflecting answers generally rated as very good or excellent. All specialists (10/10) agreed ChatGPT could complement specialist counseling. Ratings across domains were consistently above 3, except for infertility diagnosis precision. Domain averages (1–5) showed no significant differences based on specialists' experience (p=NS). Except for infertility diagnosis, which was rated as good, other domains were consistently rated as very good or excellent (mean scores were 4.2 ± 0.4): • Infertility diagnosis: Precision 3.7, clarity 3.9, relevance 3.9. • Treatment: Precision 4.0, clarity 4.1, relevance 4.2. • Medication: Precision 4.0, clarity 4.0, relevance 4.0 • Prognosis: Precision 4.2, clarity 4.2, relevance 4.2 • Lifestyle: Precision 4.2, clarity 4.2, relevance 4.3. • Time-related questions: Precision 3.9, clarity 4.1, relevance 4.2. • Emotional support: Precision 4.4, clarity 4.4, relevance 4.5. • ART technologies: Precision 4.0, clarity 4.2, relevance 4.3. Readability was scored at 19.6 ± 4.2, corresponding to a 10th-grade reading level, slightly difficult but accessible. Limitations, reasons for caution The main limitations include the lack of generalizability to other AI-based chatbots and the dependency on prompt design, which significantly impacts the quality of responses. Additionally, findings may not apply to more controversial topics, highlighting the need for further research in diverse contexts. Wider implications of the findings The findings suggest that ChatGPT 4.0 could serve as a valuable tool for initial infertility counseling, providing consistently high ratings for precision, clarity, and relevance. It may complement consultations by offering background information while the doctor validates the content and provides the essential value of human interaction in patient care. Trial registration number No

Oxford University Press (OUP)

M Schapira F Di Biase C Formica Muntaner M Montiveros D Glujovsky

Human Reproduction

2025

Title: P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Description:

Abstract Study question What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fertility questions? Summary answer ChatGPT 4.

0 responses were rated highly by specialists for accuracy, clarity, and relevance, while readability is slightly difficult.

What is known already Infertility is a challenging condition that leads many patients to seek online information about treatments and outcomes.

Chatbots like ChatGPT can provide accessible guidance, but their reliability in specialized fields like fertility remains uncertain.

Although studies on chatbot quality exist, key aspects—accuracy, clarity, and relevance—remain underexplored.

Furthermore, no research has evaluated ChatGPT’s performance in Latin America, where cultural and linguistic nuances may impact its effectiveness.

Assessing ChatGPT’s potential in fertility counseling could enhance patient communication and healthcare delivery.

Study design, size, duration This cross-sectional study analyzed ChatGPT 4.

0’s responses to 50 common fertility-related questions.

Responses were evaluated by 10 fertility specialists (5 senior, 5 junior) using the Global Quality Scale (GQS) for a general idea, and Likert scales for accuracy, clarity, and relevance.

To measure readability, we analyzed the answers with the Spanish Flesch-Kincaid score.

Domain-specific differences were statistically analyzed, providing insights into ChatGPT’s ability to support patient counseling.

Participants/materials, setting, methods Fifty frequently asked fertility-related questions were selected from patient forums and blogs.

ChatGPT 4.

0´s responses, generated using prompts requesting answers 'as if ChatGPT were an infertility specialist, using the best available evidence' were evaluated by 10 fertility specialists.

Ratings for accuracy, clarity, relevance, and readability were collected using structured tools.

Data were analyzed to determine performance differences across domains.

Main results and the role of chance All specialists rated ChatGPT as at least “good” in answering fertility-related questions, with 62% (7/10) classifying its general performance as “very good” or “excellent.

” In agreement, in the GQS scale (1–5), 44% (47/50) of responses scored ≥3, with an overall mean of 3.

6 ± 0.

6, reflecting answers generally rated as very good or excellent.

All specialists (10/10) agreed ChatGPT could complement specialist counseling.

Ratings across domains were consistently above 3, except for infertility diagnosis precision.

Domain averages (1–5) showed no significant differences based on specialists' experience (p=NS).

Except for infertility diagnosis, which was rated as good, other domains were consistently rated as very good or excellent (mean scores were 4.

2 ± 0.

4): • Infertility diagnosis: Precision 3.

7, clarity 3.

9, relevance 3.

• Treatment: Precision 4.

0, clarity 4.

1, relevance 4.

• Medication: Precision 4.

0, clarity 4.

0, relevance 4.

0 • Prognosis: Precision 4.

2, clarity 4.

2, relevance 4.

2 • Lifestyle: Precision 4.

2, clarity 4.

2, relevance 4.

• Time-related questions: Precision 3.

9, clarity 4.

1, relevance 4.

• Emotional support: Precision 4.

4, clarity 4.

4, relevance 4.

• ART technologies: Precision 4.

0, clarity 4.

2, relevance 4.

Readability was scored at 19.

6 ± 4.

2, corresponding to a 10th-grade reading level, slightly difficult but accessible.

Limitations, reasons for caution The main limitations include the lack of generalizability to other AI-based chatbots and the dependency on prompt design, which significantly impacts the quality of responses.

Additionally, findings may not apply to more controversial topics, highlighting the need for further research in diverse contexts.

Wider implications of the findings The findings suggest that ChatGPT 4.

0 could serve as a valuable tool for initial infertility counseling, providing consistently high ratings for precision, clarity, and relevance.

It may complement consultations by offering background information while the doctor validates the content and provides the essential value of human interaction in patient care.

Trial registration number No.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study

Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...

Autonomy on Trial

Photo by CHUTTERSNAP on Unsplash Abstract This paper critically examines how US bioethics and health law conceptualize patient autonomy, contrasting the rights-based, individualist...

ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research

Dear Editors, Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...

Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model

Aim/Purpose: The main goal of this study is to investigate the factors affecting students’ satisfaction and continuous usage of ChatGPT in an educational context, using the Expecta...

Performance of AI ‐Chatbots to Common Temporomandibular Joint Disorders ( TMDs ) Patient Queries: Accuracy, Completeness, Reliability and Readability

ABSTRACT TMDs are a common group of conditions affecting the temporomandibular joint (TMJ) often resulting from factors like injury, stress or teeth grinding. Thi...

Use of ChatGPT in Pediatric Urology and its Relevance in Clinical Practice: Is it useful?

Abstract Introduction Artificial intelligence (AI) can be described as the combination of computer sciences and linguistics, ob...

Neck dissection in head and neck surgery: An assessment of ChatGPT performance

Artificial intelligence models such as chat generative pre-trained transformer (ChatGPT) are being increasingly used to inform treatment-related decisions. Among otolaryngology sub...

Email:
Password:

Email:

P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Related Results