RAG for business: how to build an AI agent that actually knows your docs

RAG (retrieval-augmented generation) is the fastest way to give an LLM knowledge of your company without fine-tuning a model. Instead of fine-tuning, you build a pipeline: at every request the agent finds relevant fragments in your knowledge base and passes them into the context. Sounds simple — but the distance between “works in a demo” and “handles 10,000 users” is enormous.

Architecture of a minimally viable RAG system

The baseline has four layers: data preparation (chunking, cleanup, metadata), vector storage, retrieval logic with a reranker, and an LLM wrapper that controls answer quality.

01Document parser: PDF, DOCX, HTML, tickets from Jira/Zendesk, Slack threads
02Chunker with 10–15% overlap and headings preserved as metadata
03Embedding model (text-embedding-3-large or open-source bge-m3)
04Vector DB — Pinecone, Qdrant or pgvector if you're already on Postgres
05Reranker (Cohere Rerank or bge-reranker-v2) — critical for quality
06LLM with a system prompt that forbids out-of-context hallucinations

Five mistakes that kill projects in pilot

1. Bad chunking

Splitting documents by a fixed length is the worst choice for technical docs. The section heading ends up in one chunk and the matching code in another. Use semantic chunking or chunking based on document structure.

2. No reranker

Embedding similarity gives you a top-50 of relevant fragments, but it's the reranker that picks the 5–7 that actually answer the question. Without it, answer quality drops by 30–40% — and it's the cheapest improvement in the whole pipeline.

3. Ignoring metadata

Document version, update date, department, language — without metadata filters the agent will recommend outdated policies. Always store “freshness” and weight it during retrieval.

4. One giant index

Don't dump the whole company into a single index. Separate namespaces for HR, legal, product and customer data — both for security and quality.

5. No evaluations

Without a set of 100–200 reference questions, you won't know whether a prompt change actually improved anything. Invest in evals from day one.

RAG is 80% data engineering and 20% prompt engineering. Teams that think it's the other way around get stuck in pilot.

When RAG is not your answer

When your company's knowledge lives in people's heads, not documents
When documents update every hour and you can't run real-time indexing
When you need numerical accuracy (finance, legal) — structured retrieval with a SQL agent fits better there

Facing a similar challenge in your company?

Tell us about the task — we'll get back within one business day with a call agenda.

Start a project →