Javascript must be enabled to continue!

How does DeepSeek-R1 perform on USMLE?

View through CrossRef

AbstractDeepSeek, a Chinese artificial intelligence company, released its first free chatbot app based on its DeepSeek-R1 model. DeepSeek provides its models, algorithms, and training details to ensure transparency and reproducibility. Their new model is trained with reinforcement learning, allowing it to learn through interactions and feedback rather than relying solely on supervised learning. Reports showcase that DeepSeek’s model shows competitive performances against established large language models (LLMs) such as Anthropic’s Claude and OpenAI’s GPT-4o on established benchmarks in language understanding, mathematics (AIME 2024) and programming (Codeforces) while trained at a fraction of the costs. Additionally, running inference shows significantly lower costs, leading to DeepSeek surpassing ChatGPT as the most downloaded free app on the American iOS App Store. This development contributed to a nearly 17% drop in Nvidia’s share price, resulting in the most significant one-day loss in U.S. history, amounting to nearly $600 billion. The open-source models also bring a significant shift in the healthcare system, allowing cost-efficient medical LLMs to be deployed within hospital networks. To understand its performance in the healthcare sector, we analyse the new DeepSeek-R1 model on the United States Medical Licensing Examination (USMLE) and compare it to ChatGPT.

Cold Spring Harbor Laboratory

Lisle Faray de Paiva Gijs Luijten Behrus Puladi Jan Egger

Title: How does DeepSeek-R1 perform on USMLE?

Description:

AbstractDeepSeek, a Chinese artificial intelligence company, released its first free chatbot app based on its DeepSeek-R1 model.

DeepSeek provides its models, algorithms, and training details to ensure transparency and reproducibility.

Their new model is trained with reinforcement learning, allowing it to learn through interactions and feedback rather than relying solely on supervised learning.

Reports showcase that DeepSeek’s model shows competitive performances against established large language models (LLMs) such as Anthropic’s Claude and OpenAI’s GPT-4o on established benchmarks in language understanding, mathematics (AIME 2024) and programming (Codeforces) while trained at a fraction of the costs.

Additionally, running inference shows significantly lower costs, leading to DeepSeek surpassing ChatGPT as the most downloaded free app on the American iOS App Store.

This development contributed to a nearly 17% drop in Nvidia’s share price, resulting in the most significant one-day loss in U.

S.

history, amounting to nearly $600 billion.

The open-source models also bring a significant shift in the healthcare system, allowing cost-efficient medical LLMs to be deployed within hospital networks.

To understand its performance in the healthcare sector, we analyse the new DeepSeek-R1 model on the United States Medical Licensing Examination (USMLE) and compare it to ChatGPT.

Related Results

A Survey of DeepSeek Models

Advances in artificial intelligence (AI) rely on systems capable of human-like reasoning, a limitation for conventional Large Language Models (LLMs), which struggle with multi-step...

Factors Associated with Infectious Diseases Fellowship Academic Success

Abstract Background: A multitude of factors are considered in an infectious diseases (ID) training program’s meticulous selection process of ID fellows but their correlatio...

Research on the Value, Risks, and Responses of DeepSeek Empowering Vocational Education

With the rapid development of artificial intelligence technology, the application of DeepSeek big model in higher vocational education is becoming increasingly widespread, promotin...

Evaluation of ChatGPT vs. DeepSeek from a Privacy Perspective

The integration of artificial intelligence in healthcare has revolutionized research, diagnostics, and patient care. In particular, the emergence of ChatGPT and the recent rise of ...

A Comprehensive Study of Depression, Anxiety, and Stress Among USMLE Aspirants: A Cross-Sectional Survey

Abstract Background The United States Medical Licensure Examination (USMLE) represents a critical step for medical licensure in the United States, requiring extensive prep...

A Timely Quick Literature Review on the Deepseek in Chinese Publication

The swift rise of DeepSeek—the Chinese generative artificial intelligence (AI) model that champions open‐source innovation—has ignited scholarly interests across frontiers. This ti...

A Timely Quick Literature Review on the Deepseek in Chinese Publication

The swift rise of DeepSeek—the Chinese generative artificial intelligence (AI) model that champions open‐source innovation—has ignited scholarly interests across frontiers. This ti...

Is DeepSeek a Metacognition AI?

The relationship between metacognition and DeepSeek models represents a compelling and yet underexplored area of research. Metacognition refers to a system's capacity to monitor an...