Runs offline and asynchronously before any query arrives.
Failure here causes silent quality degradation — not user-facing errors.
The system keeps running and returning answers. The answers get worse.Components: ingestion, chunking, embedding, indexing, corpus management.Key risk: stale corpus, wrong chunking strategy, embedding model mismatch
between index time and query time.
The live query pipeline. Runs on every user query.
Every millisecond is user-facing. Latency budget: 200ms to 2 seconds.
Failure here is immediate and visible.Components: query processing, retrieval, reranking,
context assembly, generation, output validation.Key risk: empty retrieval, score collapse, silent context truncation,
LLM ignoring retrieved context.
Runs continuously in the background.
Failure here is invisible in the short term but compounds over time.
No direct latency impact. Significant cost and quality impact.Components: caching, observability, evaluation, corpus versioning, guardrails.Key risk: no eval = blind to quality drift.
No caching = cost scales linearly with traffic.
No corpus versioning = stale answers with no signal.
Each layer has a different operational profile, failure consequence,
and latency budget. rag-axis sub-packages are built and consumed
in system-building order — pre-retrieval first, retrieval core second,
system layer third.