RAG System Layers

A complete RAG system has three operational layers. Understanding this model is required before using rag-axis.

Pre-Retrieval

Runs offline and asynchronously before any query arrives. Failure here causes silent quality degradation — not user-facing errors. The system keeps running and returning answers. The answers get worse. Components: ingestion, chunking, embedding, indexing, corpus management. Key risk: stale corpus, wrong chunking strategy, embedding model mismatch between index time and query time.

Retrieval Core

The live query pipeline. Runs on every user query. Every millisecond is user-facing. Latency budget: 200ms to 2 seconds. Failure here is immediate and visible. Components: query processing, retrieval, reranking, context assembly, generation, output validation. Key risk: empty retrieval, score collapse, silent context truncation, LLM ignoring retrieved context.

System Layer

Runs continuously in the background. Failure here is invisible in the short term but compounds over time. No direct latency impact. Significant cost and quality impact. Components: caching, observability, evaluation, corpus versioning, guardrails. Key risk: no eval = blind to quality drift. No caching = cost scales linearly with traffic. No corpus versioning = stale answers with no signal.

Why This Model Matters

Each layer has a different operational profile, failure consequence, and latency budget. rag-axis sub-packages are built and consumed in system-building order — pre-retrieval first, retrieval core second, system layer third.

Getting Started

Concepts

Architecture

Packages

Contributing

Rag system layers

RAG System Layers

Pre-Retrieval

Retrieval Core

System Layer

Why This Model Matters

Getting Started

Concepts

Architecture

Packages

Contributing

​RAG System Layers

​Pre-Retrieval

​Retrieval Core

​System Layer

​Why This Model Matters

RAG System Layers

Pre-Retrieval

Retrieval Core

System Layer

Why This Model Matters