Javascript must be enabled to continue!

Passage, Sentence, or Proposition? An Empirical Comparison of Retrieval Granularity Effects on LLM Answer Accuracy in Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) has become a dominant paradigm for grounding large language model (LLM) outputs in external knowledge. While extensive research has focused on retriever architectures and generation strategies, the choice of retrieval granularity—the textual unit indexed and retrieved—remains insufficiently studied. This paper presents a controlled empirical comparison of four retrieval granularity levels: document, passage (100-word window), sentence, and proposition. Experiments are conducted across three open-domain question answering benchmarks (Natural Questions, TriviaQA, and HotpotQA) using two representative dense retrievers (DPR and Contriever) paired with LLaMA-2-7B-Chat as the reader. Results indicate that finer-grained retrieval units consistently improve retrieval recall, with proposition-level indexing achieving up to 6.8 absolute points higher Recall@20 than passage-level on Natural Questions under DPR. End-to-end answer accuracy follows a similar trend for single-hop factoid questions, where proposition-level retrieval yields the highest Exact Match scores. On multi-hop questions in HotpotQA, this advantage diminishes and passage-level retrieval produces comparable or slightly superior accuracy, suggesting that broader contextual units are beneficial when reasoning across multiple evidence pieces. These findings provide practical guidance for RAG pipeline design: retrieval granularity should be selected in accordance with question complexity, and no single granularity level dominates across all conditions.

Journal of Global Engineering Review

Xu Wang Xuanyi Fu Danbing Zou

Journal of Global Engineering Review

2026

Title: Passage, Sentence, or Proposition? An Empirical Comparison of Retrieval Granularity Effects on LLM Answer Accuracy in Retrieval-Augmented Generation

Description:

Retrieval-Augmented Generation (RAG) has become a dominant paradigm for grounding large language model (LLM) outputs in external knowledge.

While extensive research has focused on retriever architectures and generation strategies, the choice of retrieval granularity—the textual unit indexed and retrieved—remains insufficiently studied.

This paper presents a controlled empirical comparison of four retrieval granularity levels: document, passage (100-word window), sentence, and proposition.

Experiments are conducted across three open-domain question answering benchmarks (Natural Questions, TriviaQA, and HotpotQA) using two representative dense retrievers (DPR and Contriever) paired with LLaMA-2-7B-Chat as the reader.

Results indicate that finer-grained retrieval units consistently improve retrieval recall, with proposition-level indexing achieving up to 6.

8 absolute points higher Recall@20 than passage-level on Natural Questions under DPR.

End-to-end answer accuracy follows a similar trend for single-hop factoid questions, where proposition-level retrieval yields the highest Exact Match scores.

On multi-hop questions in HotpotQA, this advantage diminishes and passage-level retrieval produces comparable or slightly superior accuracy, suggesting that broader contextual units are beneficial when reasoning across multiple evidence pieces.

These findings provide practical guidance for RAG pipeline design: retrieval granularity should be selected in accordance with question complexity, and no single granularity level dominates across all conditions.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Automating Information Retrieval from Biodiversity Literature Using Large Language Models: A Case Study

Recently, Large Language Models (LLMs) have transformed information retrieval, becoming widely adopted across various domains due to their ability to process extensive textual data...

Pola Fungsi Kalimat pada Novel “Pulang” Karya Tere Liye dan Kelayakannya sebagai Materi Pengayaan Siswa Kelas Xll SMA

Understanding sentence function patterns plays a major role in reading a novel, especially in class XII. By studying the understanding of sentence function patterns, class XII stud...

KALIMAT TANYA DALAM BAHASA INDONESIA

Interrogative sentence is one kind of sentences in Indonesian, which formed as proposition that required answer from hearer. It also called as requesting question. The difference w...

Human-AI Collaboration in Clinical Reasoning: A UK Replication and Interaction Analysis

Abstract Objective A paper from Goh et al found that a large language model (LLM) working alone outperformed American clinicians assisted...

A Review of Video Text Retrieval Research

Video text retrieval is a hot research topic in artificial intelligence, with the core challenge being the semantic gap between visual dynamic features and discrete linguistic symb...

Unraveling the landscape of large language models: a systematic review and future perspectives

PurposeThe rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a compreh...

Study on Electromagnetic Shielding of Infrared /Visible Optical Window

In allusion to electromagnetic radiation damage that existed in daily life, social safety and military field, electromagnetic shielding technology of infrared and infrared optical ...

Email:
Password:

Email:

Passage, Sentence, or Proposition? An Empirical Comparison of Retrieval Granularity Effects on LLM Answer Accuracy in Retrieval-Augmented Generation

Related Results