Javascript must be enabled to continue!
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
View through CrossRef
Abstract
Study question
What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fertility questions?
Summary answer
ChatGPT 4.0 responses were rated highly by specialists for accuracy, clarity, and relevance, while readability is slightly difficult.
What is known already
Infertility is a challenging condition that leads many patients to seek online information about treatments and outcomes. Chatbots like ChatGPT can provide accessible guidance, but their reliability in specialized fields like fertility remains uncertain. Although studies on chatbot quality exist, key aspects—accuracy, clarity, and relevance—remain underexplored. Furthermore, no research has evaluated ChatGPT’s performance in Latin America, where cultural and linguistic nuances may impact its effectiveness. Assessing ChatGPT’s potential in fertility counseling could enhance patient communication and healthcare delivery.
Study design, size, duration
This cross-sectional study analyzed ChatGPT 4.0’s responses to 50 common fertility-related questions. Responses were evaluated by 10 fertility specialists (5 senior, 5 junior) using the Global Quality Scale (GQS) for a general idea, and Likert scales for accuracy, clarity, and relevance. To measure readability, we analyzed the answers with the Spanish Flesch-Kincaid score. Domain-specific differences were statistically analyzed, providing insights into ChatGPT’s ability to support patient counseling.
Participants/materials, setting, methods
Fifty frequently asked fertility-related questions were selected from patient forums and blogs. ChatGPT 4.0´s responses, generated using prompts requesting answers 'as if ChatGPT were an infertility specialist, using the best available evidence' were evaluated by 10 fertility specialists. Ratings for accuracy, clarity, relevance, and readability were collected using structured tools. Data were analyzed to determine performance differences across domains.
Main results and the role of chance
All specialists rated ChatGPT as at least “good” in answering fertility-related questions, with 62% (7/10) classifying its general performance as “very good” or “excellent.” In agreement, in the GQS scale (1–5), 44% (47/50) of responses scored ≥3, with an overall mean of 3.6 ± 0.6, reflecting answers generally rated as very good or excellent.
All specialists (10/10) agreed ChatGPT could complement specialist counseling. Ratings across domains were consistently above 3, except for infertility diagnosis precision. Domain averages (1–5) showed no significant differences based on specialists' experience (p=NS). Except for infertility diagnosis, which was rated as good, other domains were consistently rated as very good or excellent (mean scores were 4.2 ± 0.4):
• Infertility diagnosis: Precision 3.7, clarity 3.9, relevance 3.9.
• Treatment: Precision 4.0, clarity 4.1, relevance 4.2.
• Medication: Precision 4.0, clarity 4.0, relevance 4.0
• Prognosis: Precision 4.2, clarity 4.2, relevance 4.2
• Lifestyle: Precision 4.2, clarity 4.2, relevance 4.3.
• Time-related questions: Precision 3.9, clarity 4.1, relevance 4.2.
• Emotional support: Precision 4.4, clarity 4.4, relevance 4.5.
• ART technologies: Precision 4.0, clarity 4.2, relevance 4.3.
Readability was scored at 19.6 ± 4.2, corresponding to a 10th-grade reading level, slightly difficult but accessible.
Limitations, reasons for caution
The main limitations include the lack of generalizability to other AI-based chatbots and the dependency on prompt design, which significantly impacts the quality of responses. Additionally, findings may not apply to more controversial topics, highlighting the need for further research in diverse contexts.
Wider implications of the findings
The findings suggest that ChatGPT 4.0 could serve as a valuable tool for initial infertility counseling, providing consistently high ratings for precision, clarity, and relevance. It may complement consultations by offering background information while the doctor validates the content and provides the essential value of human interaction in patient care.
Trial registration number
No
Oxford University Press (OUP)
Title: P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
Description:
Abstract
Study question
What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fertility questions?
Summary answer
ChatGPT 4.
0 responses were rated highly by specialists for accuracy, clarity, and relevance, while readability is slightly difficult.
What is known already
Infertility is a challenging condition that leads many patients to seek online information about treatments and outcomes.
Chatbots like ChatGPT can provide accessible guidance, but their reliability in specialized fields like fertility remains uncertain.
Although studies on chatbot quality exist, key aspects—accuracy, clarity, and relevance—remain underexplored.
Furthermore, no research has evaluated ChatGPT’s performance in Latin America, where cultural and linguistic nuances may impact its effectiveness.
Assessing ChatGPT’s potential in fertility counseling could enhance patient communication and healthcare delivery.
Study design, size, duration
This cross-sectional study analyzed ChatGPT 4.
0’s responses to 50 common fertility-related questions.
Responses were evaluated by 10 fertility specialists (5 senior, 5 junior) using the Global Quality Scale (GQS) for a general idea, and Likert scales for accuracy, clarity, and relevance.
To measure readability, we analyzed the answers with the Spanish Flesch-Kincaid score.
Domain-specific differences were statistically analyzed, providing insights into ChatGPT’s ability to support patient counseling.
Participants/materials, setting, methods
Fifty frequently asked fertility-related questions were selected from patient forums and blogs.
ChatGPT 4.
0´s responses, generated using prompts requesting answers 'as if ChatGPT were an infertility specialist, using the best available evidence' were evaluated by 10 fertility specialists.
Ratings for accuracy, clarity, relevance, and readability were collected using structured tools.
Data were analyzed to determine performance differences across domains.
Main results and the role of chance
All specialists rated ChatGPT as at least “good” in answering fertility-related questions, with 62% (7/10) classifying its general performance as “very good” or “excellent.
” In agreement, in the GQS scale (1–5), 44% (47/50) of responses scored ≥3, with an overall mean of 3.
6 ± 0.
6, reflecting answers generally rated as very good or excellent.
All specialists (10/10) agreed ChatGPT could complement specialist counseling.
Ratings across domains were consistently above 3, except for infertility diagnosis precision.
Domain averages (1–5) showed no significant differences based on specialists' experience (p=NS).
Except for infertility diagnosis, which was rated as good, other domains were consistently rated as very good or excellent (mean scores were 4.
2 ± 0.
4):
• Infertility diagnosis: Precision 3.
7, clarity 3.
9, relevance 3.
9.
• Treatment: Precision 4.
0, clarity 4.
1, relevance 4.
2.
• Medication: Precision 4.
0, clarity 4.
0, relevance 4.
0
• Prognosis: Precision 4.
2, clarity 4.
2, relevance 4.
2
• Lifestyle: Precision 4.
2, clarity 4.
2, relevance 4.
3.
• Time-related questions: Precision 3.
9, clarity 4.
1, relevance 4.
2.
• Emotional support: Precision 4.
4, clarity 4.
4, relevance 4.
5.
• ART technologies: Precision 4.
0, clarity 4.
2, relevance 4.
3.
Readability was scored at 19.
6 ± 4.
2, corresponding to a 10th-grade reading level, slightly difficult but accessible.
Limitations, reasons for caution
The main limitations include the lack of generalizability to other AI-based chatbots and the dependency on prompt design, which significantly impacts the quality of responses.
Additionally, findings may not apply to more controversial topics, highlighting the need for further research in diverse contexts.
Wider implications of the findings
The findings suggest that ChatGPT 4.
0 could serve as a valuable tool for initial infertility counseling, providing consistently high ratings for precision, clarity, and relevance.
It may complement consultations by offering background information while the doctor validates the content and provides the essential value of human interaction in patient care.
Trial registration number
No.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Autonomy on Trial
Autonomy on Trial
Photo by CHUTTERSNAP on Unsplash
Abstract
This paper critically examines how US bioethics and health law conceptualize patient autonomy, contrasting the rights-based, individualist...
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
This research investigates the effect of ChatGPT on the learning outcomes of pre-service biology teachers. Sampling was done by purposive sampling in class A (treated with ChatGPT)...
Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Abstract
To evaluate and compare the performance of publicly available ChatGPT-3.5, ChatGPT-4.0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound ...
Global Healthcare Professionals’ Perceptions of Large Language Model Use In Practice (Preprint)
Global Healthcare Professionals’ Perceptions of Large Language Model Use In Practice (Preprint)
BACKGROUND
Chat Generative Pre-Trained Transformer (ChatGPTTM) is a large language model (LLM)-based chatbot developed by OpenAITM. ChatGPT has many potenti...
Appearance of ChatGPT and English Study
Appearance of ChatGPT and English Study
The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...
ChatGPT for tinnitus information and support: response accuracy and retest after three months (Preprint)
ChatGPT for tinnitus information and support: response accuracy and retest after three months (Preprint)
BACKGROUND
ChatGPT – a conversational tool based on artificial intelligence – has recently been tested on a range of topics. However most of the testing has...

