Javascript must be enabled to continue!
Prompt Carefully! ChatGPT Displays Rule-Based Insensitivity to Contingencies
View through CrossRef
Rule-governed behavior in humans is characterized by relative insensitivity to changes in contingencies, a phenomenon extensively documented in behavior-analytic research. The present study examined whether Large Language Models (LLMs) exhibit analogous patterns of contingency insensitivity. We employed a rock–paper–scissors task in which models repeatedly made choices between two opponents (Sam or Alex). In the first block of 40 trials, selecting the optimal opponent (e.g., Sam) produced a 70% probability of winning, a 15% probability of a tie, and a 15% probability of losing. In the second block of 40 trials, these probabilities were reversed (i.e., Alex was the optimal opponent). Four frontier LLMs in January 2026 (GPT 5.2, Claude Opus 4.5, Grok 4.1 Fast, and Gemini 3 Flash) were evaluated under a 2 × 2 experimental design manipulating (a) the presence or absence of a rule describing the opponent's skill level and (b) the extended LLM's reasoning (present vs. absent). In rule conditions, prompts specified the purported skill of the initial optimal opponent (e.g., "Sam is not very good at this game"). Results indicated that all models exhibited rule-based insensitivity to contingencies, qualitatively resembling human rule-governed behavior. However, the degree of insensitivity varied across models: GPT 5.2 and Grok 4.1 Fast showed the greatest contingency insensitivity, whereas Gemini 3 Flash and Claude Opus 4.5 were comparatively more sensitive to contingency shift. The effect of extended reasoning varied across LLMs. This study is the first to demonstrate contingency insensitivity in LLMs. These results have important implications for applied LLM contexts, where LLMs' contingency insensitivity might be detrimental.
Title: Prompt Carefully! ChatGPT Displays Rule-Based Insensitivity to Contingencies
Description:
Rule-governed behavior in humans is characterized by relative insensitivity to changes in contingencies, a phenomenon extensively documented in behavior-analytic research.
The present study examined whether Large Language Models (LLMs) exhibit analogous patterns of contingency insensitivity.
We employed a rock–paper–scissors task in which models repeatedly made choices between two opponents (Sam or Alex).
In the first block of 40 trials, selecting the optimal opponent (e.
g.
, Sam) produced a 70% probability of winning, a 15% probability of a tie, and a 15% probability of losing.
In the second block of 40 trials, these probabilities were reversed (i.
e.
, Alex was the optimal opponent).
Four frontier LLMs in January 2026 (GPT 5.
2, Claude Opus 4.
5, Grok 4.
1 Fast, and Gemini 3 Flash) were evaluated under a 2 × 2 experimental design manipulating (a) the presence or absence of a rule describing the opponent's skill level and (b) the extended LLM's reasoning (present vs.
absent).
In rule conditions, prompts specified the purported skill of the initial optimal opponent (e.
g.
, "Sam is not very good at this game").
Results indicated that all models exhibited rule-based insensitivity to contingencies, qualitatively resembling human rule-governed behavior.
However, the degree of insensitivity varied across models: GPT 5.
2 and Grok 4.
1 Fast showed the greatest contingency insensitivity, whereas Gemini 3 Flash and Claude Opus 4.
5 were comparatively more sensitive to contingency shift.
The effect of extended reasoning varied across LLMs.
This study is the first to demonstrate contingency insensitivity in LLMs.
These results have important implications for applied LLM contexts, where LLMs' contingency insensitivity might be detrimental.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Appearance of ChatGPT and English Study
Appearance of ChatGPT and English Study
The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)
BACKGROUND
With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (Cha...
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
Abstract
Study question
What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fert...
ChatGPT: "To be or not to be" ... in academic research. The human mind's analytical rigor and capacity to discriminate between AI bots' truths and hallucinations
ChatGPT: "To be or not to be" ... in academic research. The human mind's analytical rigor and capacity to discriminate between AI bots' truths and hallucinations
Background. ChatGPT can generate increasingly realistic language, but the correctness and integrity of implementing these models in scientific papers remain unknown.
Recently publ...
Bibliometric Analysis on ChatGPT Research with CiteSpace
Bibliometric Analysis on ChatGPT Research with CiteSpace
ChatGPT is a generative artificial intelligence (AI) based chatbot developed by OpenAI and has attracted great attention since its launch in late 2022. This study aims to provide a...
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions
Background
Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sour...

