Javascript must be enabled to continue!

Prompt Carefully! ChatGPT Displays Rule-Based Insensitivity to Contingencies

Rule-governed behavior in humans is characterized by relative insensitivity to changes in contingencies, a phenomenon extensively documented in behavior-analytic research. The present study examined whether Large Language Models (LLMs) exhibit analogous patterns of contingency insensitivity. We employed a rock–paper–scissors task in which models repeatedly made choices between two opponents (Sam or Alex). In the first block of 40 trials, selecting the optimal opponent (e.g., Sam) produced a 70% probability of winning, a 15% probability of a tie, and a 15% probability of losing. In the second block of 40 trials, these probabilities were reversed (i.e., Alex was the optimal opponent). Four frontier LLMs in January 2026 (GPT 5.2, Claude Opus 4.5, Grok 4.1 Fast, and Gemini 3 Flash) were evaluated under a 2 × 2 experimental design manipulating (a) the presence or absence of a rule describing the opponent's skill level and (b) the extended LLM's reasoning (present vs. absent). In rule conditions, prompts specified the purported skill of the initial optimal opponent (e.g., "Sam is not very good at this game"). Results indicated that all models exhibited rule-based insensitivity to contingencies, qualitatively resembling human rule-governed behavior. However, the degree of insensitivity varied across models: GPT 5.2 and Grok 4.1 Fast showed the greatest contingency insensitivity, whereas Gemini 3 Flash and Claude Opus 4.5 were comparatively more sensitive to contingency shift. The effect of extended reasoning varied across LLMs. This study is the first to demonstrate contingency insensitivity in LLMs. These results have important implications for applied LLM contexts, where LLMs' contingency insensitivity might be detrimental.

Center for Open Science

Francisco J. Ruiz Verónica Cardona-Betancourt

2026

Title: Prompt Carefully! ChatGPT Displays Rule-Based Insensitivity to Contingencies

Description:

Rule-governed behavior in humans is characterized by relative insensitivity to changes in contingencies, a phenomenon extensively documented in behavior-analytic research.

The present study examined whether Large Language Models (LLMs) exhibit analogous patterns of contingency insensitivity.

We employed a rock–paper–scissors task in which models repeatedly made choices between two opponents (Sam or Alex).

In the first block of 40 trials, selecting the optimal opponent (e.

, Sam) produced a 70% probability of winning, a 15% probability of a tie, and a 15% probability of losing.

In the second block of 40 trials, these probabilities were reversed (i.

, Alex was the optimal opponent).

Four frontier LLMs in January 2026 (GPT 5.

2, Claude Opus 4.

5, Grok 4.

1 Fast, and Gemini 3 Flash) were evaluated under a 2 × 2 experimental design manipulating (a) the presence or absence of a rule describing the opponent's skill level and (b) the extended LLM's reasoning (present vs.

absent).

In rule conditions, prompts specified the purported skill of the initial optimal opponent (e.

, "Sam is not very good at this game").

Results indicated that all models exhibited rule-based insensitivity to contingencies, qualitatively resembling human rule-governed behavior.

However, the degree of insensitivity varied across models: GPT 5.

2 and Grok 4.

1 Fast showed the greatest contingency insensitivity, whereas Gemini 3 Flash and Claude Opus 4.

5 were comparatively more sensitive to contingency shift.

The effect of extended reasoning varied across LLMs.

This study is the first to demonstrate contingency insensitivity in LLMs.

These results have important implications for applied LLM contexts, where LLMs' contingency insensitivity might be detrimental.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study

Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...

Appearance of ChatGPT and English Study

The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...

User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)

BACKGROUND With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (Cha...

P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Abstract Study question What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fert...

ChatGPT: "To be or not to be" ... in academic research. The human mind's analytical rigor and capacity to discriminate between AI bots' truths and hallucinations

Background. ChatGPT can generate increasingly realistic language, but the correctness and integrity of implementing these models in scientific papers remain unknown. Recently publ...

Bibliometric Analysis on ChatGPT Research with CiteSpace

ChatGPT is a generative artificial intelligence (AI) based chatbot developed by OpenAI and has attracted great attention since its launch in late 2022. This study aims to provide a...

ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions

Background Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sour...

Email:
Password:

Email:

Prompt Carefully! ChatGPT Displays Rule-Based Insensitivity to Contingencies

Related Results