Javascript must be enabled to continue!
Prompt Carefully! ChatGPT Displays Rule-Based Insensitivity to Contingencies
View through CrossRef
Rule-governed behavior in humans is characterized by relative insensitivity to changes in contingencies, a phenomenon extensively documented in behavior-analytic research. The present study examined whether Large Language Models (LLMs) exhibit analogous patterns of contingency insensitivity. We employed a rock–paper–scissors task in which models repeatedly made choices between two opponents (Sam or Alex). In the first block of 40 trials, selecting the optimal opponent (e.g., Sam) produced a 70% probability of winning, a 15% probability of a tie, and a 15% probability of losing. In the second block of 40 trials, these probabilities were reversed (i.e., Alex was the optimal opponent). Four frontier LLMs in January 2026 (GPT 5.2, Claude Opus 4.5, Grok 4.1 Fast, and Gemini 3 Flash) were evaluated under a 2 × 2 experimental design manipulating (a) the presence or absence of a rule describing the opponent's skill level and (b) the extended LLM's reasoning (present vs. absent). In rule conditions, prompts specified the purported skill of the initial optimal opponent (e.g., "Sam is not very good at this game"). Results indicated that all models exhibited rule-based insensitivity to contingencies, qualitatively resembling human rule-governed behavior. However, the degree of insensitivity varied across models: GPT 5.2 and Grok 4.1 Fast showed the greatest contingency insensitivity, whereas Gemini 3 Flash and Claude Opus 4.5 were comparatively more sensitive to contingency shift. The effect of extended reasoning varied across LLMs. This study is the first to demonstrate contingency insensitivity in LLMs. These results have important implications for applied LLM contexts, where LLMs' contingency insensitivity might be detrimental.
Title: Prompt Carefully! ChatGPT Displays Rule-Based Insensitivity to Contingencies
Description:
Rule-governed behavior in humans is characterized by relative insensitivity to changes in contingencies, a phenomenon extensively documented in behavior-analytic research.
The present study examined whether Large Language Models (LLMs) exhibit analogous patterns of contingency insensitivity.
We employed a rock–paper–scissors task in which models repeatedly made choices between two opponents (Sam or Alex).
In the first block of 40 trials, selecting the optimal opponent (e.
g.
, Sam) produced a 70% probability of winning, a 15% probability of a tie, and a 15% probability of losing.
In the second block of 40 trials, these probabilities were reversed (i.
e.
, Alex was the optimal opponent).
Four frontier LLMs in January 2026 (GPT 5.
2, Claude Opus 4.
5, Grok 4.
1 Fast, and Gemini 3 Flash) were evaluated under a 2 × 2 experimental design manipulating (a) the presence or absence of a rule describing the opponent's skill level and (b) the extended LLM's reasoning (present vs.
absent).
In rule conditions, prompts specified the purported skill of the initial optimal opponent (e.
g.
, "Sam is not very good at this game").
Results indicated that all models exhibited rule-based insensitivity to contingencies, qualitatively resembling human rule-governed behavior.
However, the degree of insensitivity varied across models: GPT 5.
2 and Grok 4.
1 Fast showed the greatest contingency insensitivity, whereas Gemini 3 Flash and Claude Opus 4.
5 were comparatively more sensitive to contingency shift.
The effect of extended reasoning varied across LLMs.
This study is the first to demonstrate contingency insensitivity in LLMs.
These results have important implications for applied LLM contexts, where LLMs' contingency insensitivity might be detrimental.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
Dear Editors,
Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Aim/Purpose: The main goal of this study is to investigate the factors affecting students’ satisfaction and continuous usage of ChatGPT in an educational context, using the Expecta...
Appearance of ChatGPT and English Study
Appearance of ChatGPT and English Study
The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...
Performance of
AI
‐Chatbots to Common Temporomandibular Joint Disorders (
TMDs
) Patient Queries: Accuracy, Completeness, Reliability and Readability
Performance of
AI
‐Chatbots to Common Temporomandibular Joint Disorders (
TMDs
) Patient Queries: Accuracy, Completeness, Reliability and Readability
ABSTRACT
TMDs are a common group of conditions affecting the temporomandibular joint (TMJ) often resulting from factors like injury, stress or teeth grinding. Thi...
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)
BACKGROUND
With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (Cha...
ChatGPT and Medical Education: A Double-Edged Sword
ChatGPT and Medical Education: A Double-Edged Sword
ChatGPT has gained attention worldwide. In the medical education field, ChatGPT, or any similar large language model, provides a convenient way for students to access information a...

