Javascript must be enabled to continue!

Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence (AI) Scribes for Primary Care: A Competitive Analysis (Preprint)

BACKGROUND Primary care providers (PCPs) face significant burnout due to increasing administrative and documentation demands, contributing to job dissatisfaction and impacting care quality. Artificial intelligence (AI) scribes have emerged as potential solutions to reduce administrative burden by automating clinical documentation of patient encounters. Although AI scribes are gaining popularity in primary care, there is limited information on their usability, effectiveness, and accuracy. OBJECTIVE This study aimed to develop and apply an evaluation framework to systematically assess the usability, technical performance, and accuracy of various AI scribes used in primary care settings across Canada and the United States. METHODS We conducted a systematic comparison of a suite of AI scribes using competitive analysis methods. An evaluation framework was developed using expert usability approaches and human factors engineering principles, and comprises of 3 domains: usability, effectiveness and technical performance, and accuracy and quality. Audio files from 4 standardized patient encounters were used to generate transcripts and SOAP-format (Subjective, Objective, Assessment, Plan) medical notes from each AI scribe. A verbatim transcript, detailed case notes, and physician-written medical notes for each audio file served as a benchmark for comparison against the AI-generated outputs. Applicable items were rated on a 3-point Likert scale (1 = poor, 2 = good, 3 = excellent). Additional insights were gathered from clinical experts, vendor questionnaires, and public resources to support usability, effectiveness, and quality findings. RESULTS In total, 6 AI scribes were evaluated, with notable performance differences. Most AI scribes could be accessed via various platforms (n=4) and launched within common electronic medical records (EMRs), though data exchange capabilities were limited. Nearly all AI scribes generated SOAP-format notes in approximately one minute for a 15-minute standardized encounter (n=5), though documentation time increased with encounter length and topic complexity. While all AI scribes produced good to excellent quality medical notes, none were consistently error-free. Common errors included deletion, omission, and SOAP structure errors. Factors such as extraneous conversations and multiple speakers impacted the accuracy of both the transcript and medical note, with some AI scribes producing excellent notes despite minor transcript issues and vice versa. Limitations in usability, technical performance, and accuracy suggest areas for improvement to fully realize AI scribes' potential in reducing administrative burden for PCPs. CONCLUSIONS This study offers one of the first systematic evaluations on the usability, effectiveness, and accuracy of a suite of AI scribes currently used in primary care, providing benchmark data for further research, policy, and practice. While AI scribes show promise in reducing documentation burdens, improvements and ongoing evaluations are essential to ensure safe and effective use. Future studies should assess AI scribe performance in real-world settings across diverse populations to support equitable and reliable application.

JMIR Publications Inc.

Emily Ha Isabelle Choon-Kon-Yune LaShawn Murray Siying Luan Enid Montague Onil Bhattacharyya Payal Agarwal

2025

Title: Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence (AI) Scribes for Primary Care: A Competitive Analysis (Preprint)

Description:

BACKGROUND Primary care providers (PCPs) face significant burnout due to increasing administrative and documentation demands, contributing to job dissatisfaction and impacting care quality.

Artificial intelligence (AI) scribes have emerged as potential solutions to reduce administrative burden by automating clinical documentation of patient encounters.

Although AI scribes are gaining popularity in primary care, there is limited information on their usability, effectiveness, and accuracy.

OBJECTIVE This study aimed to develop and apply an evaluation framework to systematically assess the usability, technical performance, and accuracy of various AI scribes used in primary care settings across Canada and the United States.

METHODS We conducted a systematic comparison of a suite of AI scribes using competitive analysis methods.

An evaluation framework was developed using expert usability approaches and human factors engineering principles, and comprises of 3 domains: usability, effectiveness and technical performance, and accuracy and quality.

Audio files from 4 standardized patient encounters were used to generate transcripts and SOAP-format (Subjective, Objective, Assessment, Plan) medical notes from each AI scribe.

A verbatim transcript, detailed case notes, and physician-written medical notes for each audio file served as a benchmark for comparison against the AI-generated outputs.

Applicable items were rated on a 3-point Likert scale (1 = poor, 2 = good, 3 = excellent).

Additional insights were gathered from clinical experts, vendor questionnaires, and public resources to support usability, effectiveness, and quality findings.

RESULTS In total, 6 AI scribes were evaluated, with notable performance differences.

Most AI scribes could be accessed via various platforms (n=4) and launched within common electronic medical records (EMRs), though data exchange capabilities were limited.

Nearly all AI scribes generated SOAP-format notes in approximately one minute for a 15-minute standardized encounter (n=5), though documentation time increased with encounter length and topic complexity.

While all AI scribes produced good to excellent quality medical notes, none were consistently error-free.

Common errors included deletion, omission, and SOAP structure errors.

Factors such as extraneous conversations and multiple speakers impacted the accuracy of both the transcript and medical note, with some AI scribes producing excellent notes despite minor transcript issues and vice versa.

Limitations in usability, technical performance, and accuracy suggest areas for improvement to fully realize AI scribes' potential in reducing administrative burden for PCPs.

CONCLUSIONS This study offers one of the first systematic evaluations on the usability, effectiveness, and accuracy of a suite of AI scribes currently used in primary care, providing benchmark data for further research, policy, and practice.

While AI scribes show promise in reducing documentation burdens, improvements and ongoing evaluations are essential to ensure safe and effective use.

Future studies should assess AI scribe performance in real-world settings across diverse populations to support equitable and reliable application.

Back

ABSTRACT Usability is a fundamental software quality attribute that strongly influences user performance, acceptance, and the overall success of software systems....

LO55: A pilot evaluation of medical scribes in a Canadian emergency department

Introduction: Improving emergency department productivity has been a priority across Canada. In the United States, medical scribes have been utilized to increase the number of pati...

Electronic Health Record Usability Evaluation Improves Training (Preprint)

BACKGROUND EHRs are said to reduce physician workload, however, physicians who are not appropriately trained on using an EHR in medical school may encounter...

Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)

BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...

La luz: de herramienta a lenguaje. Una nueva metodología de iluminación artificial en el proyecto arquitectónico.

The constant development of artificial lighting throughout the twentieth century helped to develop architecture to the current situation in which a new methodology is needed for ...

Factors associated with usability of the EMPOWER-SUSTAIN Global Cardiovascular Risks Self-Management Booklet© among individuals with metabolic syndrome in primary care: a cross-sectional study

Abstract Background Self-management support has been recognized as one of the most essential elements of the Chronic Care Model (CCM). Inspired by t...

Translation, Cross-Cultural Adaptation, and Validation of the Malay Version of the System Usability Scale Questionnaire for the Assessment of Mobile Apps (Preprint)

BACKGROUND A mobile app is a programmed system designed to be used by a target user on a mobile device. The usability of such a system refers not only to th...

Perancangan Usability Website Interface Sistem Informasi Kerusakan Laboratorium Universitas AMIKOM Yogyakarta

INTISASIUsability sebagai ukuran kualitas pengalaman pengguna seringkali dikatakan sebagai suatu nilai penerimaan (acceptance) seseorang terhadap suatu produk ketika berinteraksi d...

Email:
Password:

Email:

Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence (AI) Scribes for Primary Care: A Competitive Analysis (Preprint)

Related Results