RAG PILOT · COSTS

What does a RAG pilot cost? Three tiers: 1k, 10k, 100k documents

A real-world cost breakdown for a RAG pilot in Switzerland: embedding, vector DB, LLM inference, setup effort. Three volume tiers with figures in CHF.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is this about?

A RAG pilot (Retrieval-Augmented Generation, see retrieval-augmented-generation) is the smallest productive form of making your own document knowledge accessible to AI: documents are indexed, a vector database stores embeddings, a language model answers with source citation. The question is concrete: what does building it cost, what does monthly operation cost, and how does the cost structure scale from 1,000 to 100,000 documents?

This page calculates three tiers – 1,000 / 10,000 / 100,000 documents – with clear assumptions: average document length 5 pages, around 2,500 tokens per document. We distinguish one-time setup costs (embedding, engineering) from running operational costs (storage, LLM queries, maintenance). Numbers come from May 2026 published prices at OpenAI, Cohere, Qdrant Cloud, Pinecone, Hetzner, plus the fairlane.systems pricing list.

Key insight upfront: a RAG pilot with 10,000 documents costs CHF 3,500 to CHF 8,500 to build (depending on interface complexity), CHF 30 to CHF 80 per month to operate plus model calls. At 100,000 documents storage and setup do not double – costs scale sub-linearly because embedding models and vector DBs have constant unit costs.

Why the pilot question matters

Three common misunderstandings make RAG projects more expensive than necessary. First, vendor anxiety. Hearing "AI", people expect enterprise prices with five-figure setup fees. Reality: open-source components (Qdrant, LangChain, LlamaIndex, vLLM, Ollama) are free, and the real effort sits in clean data mapping. Second, wrong scaling picture. Many calculate linearly ("10x documents = 10x cost"). Actually the vector DB scales sub-linearly because storage is cheap and embedding costs are one-time. Third, engineering effort underestimated. The most expensive component is not the cloud bill but the setup time for document ingestion, chunking strategy, and quality validation.

For Swiss SMEs and fiduciaries these numbers drive the go/no-go decision. A CHF 3,500 pilot with 4-week runtime is a defensible investment. A CHF 35,000 pilot is not. Starting without a cost frame leads to either over-engineering (custom vector DB platform) or under-engineering (prototype without audit trail that cannot go into production).

Fourth point: running costs depend more on usage behaviour than on document count. 10 queries per day cost CHF 0.50 to CHF 5 per month in LLM calls – depending on chosen model (the current DeepSeek-V generation USD 0.30/0.50, Claude Sonnet USD 3/15 per 1M tokens). That makes routing strategies (see multi-LLM routing strategies) immediately relevant.

Three tiers, honest numbers

We calculate three volume tiers with the same assumptions: average 5 pages per document, 2,500 tokens, monthly 200 queries, each 8,000 input/1,500 output tokens (typical fiduciary profile).

Tier 1: 1,000 documents (pilot minimum) Index volume: 2.5M tokens. One-time embedding (OpenAI text-embedding-3-small at USD 0.02/1M tokens): USD 0.05 – effectively zero. Vector DB Qdrant self-host: CHF 0 (runs on a 2GB-RAM server alongside). Qdrant Cloud Starter: USD 25/month. Pinecone Standard: USD 70/month. LLM inference at 200 queries/month with Claude Sonnet (USD 3/15): about USD 9/month. With the current DeepSeek-V generation (USD 0.30/0.50): about USD 1.50/month. Engineering setup: 3-5 days at CHF 1,200/day = CHF 3,500 to CHF 6,000 one-time. Total one-time: CHF 3,500 (self-host) to CHF 6,000 (cloud stack). Total monthly: CHF 0-90.

Tier 2: 10,000 documents (small law firm / mid-sized fiduciary) Index volume: 25M tokens. One-time embedding: USD 0.50 – still effectively zero. Vector DB Qdrant self-host: CHF 30-50/month (small Hetzner VPS). Qdrant Cloud: USD 35-70/month. Pinecone: USD 70-150/month. LLM inference: same 200 queries/month = USD 9 or USD 1.50. Engineering setup: 5-8 days = CHF 5,500 to CHF 8,500. Total one-time: CHF 5,500 to CHF 8,500. Total monthly: CHF 30-160.

Tier 3: 100,000 documents (mid-cap / large law firm) Index volume: 250M tokens. One-time embedding: USD 5 – still effectively negligible. Vector DB Qdrant self-host: CHF 80-150/month (larger VPS or small dedicated at Hetzner). Qdrant Cloud: USD 200-500/month. Pinecone: USD 500-1200/month. LLM inference at e.g. 2,000 queries/month: USD 90 (Claude) or USD 15 (DeepSeek). Engineering setup: 10-15 days (multi-interface ingestion, audit trail, RBAC) = CHF 12,000 to CHF 18,000. Total one-time: CHF 12,000 to CHF 18,000. Total monthly: CHF 100-1,500.

fairlane.systems setup: we budget CHF 3,500 (tier 1) or CHF 5,500-8,500 (tier 2) as a flat fee for a clean fiduciary RAG pilot, including document ingestion, chunking optimisation, test suite, audit trail setup. See rag-eigenes-wissen for service details.

Interesting finding: at 100x more documents, costs rise only 3-5x, not 100x. That makes RAG especially attractive for SMEs: you can start small and scale without architecture change.

RAG pilot calculation in 6 steps

01Count documents: how many PDFs/Word/emails in the storage? Estimate average length (pages or tokens).
02Estimate query frequency: 10, 50, 200, 1,000 queries per month? Typically 8,000 input + 1,500 output tokens per query.
03Pick a model: die aktuelle DeepSeek-V-Generation for cost (USD 0.30/0.50 per 1M), Claude Sonnet for quality (USD 3/15), Mistral Large for EU region (USD 2/6).
04Pick a vector DB option: Qdrant self-host (CHF 0-150/month) or Qdrant Cloud / Pinecone (USD 25-1,200/month).
05Calculate setup effort: 3-5 days for 1k docs, 5-8 days for 10k, 10-15 days for 100k. At CHF 1,200/day.
06Compute ROI: saved research hours times internal rate minus monthly operating cost. Payback 1-6 months is typical.

When a RAG pilot makes sense

A RAG pilot pays off when (a) the answer to concrete questions lives in your documents, (b) you have this question at least 50 times per month, (c) the answer must be provable (audit, client protection, review), and (d) you have a CHF 3,500-8,500 budget for entry.

Concrete profiles where a RAG pilot makes sense: fiduciary office with 5,000 client correspondence PDFs from the last 5 years. Law firm with 2,000 case notes and 8,000 contract templates. SME with 1,500 manuals and SOPs in multiple languages. Insurance broker with 12,000 policy documents and claim files. Architecture office with 800 codes/standards and 4,000 project documents.

In every case the cost structure is the same: setup flat fee 3,500-8,500, monthly operation under CHF 100. ROI usually sits in saved research time: when an employee spends two hours per day in file search and RAG reduces that to 30 minutes, that is 1.5h x 20 days = 30h/month saved. At internal CHF 100/h that is CHF 3,000 per month – the pilot pays back in 2-3 months.

When the pilot does not pay off

A RAG pilot does not pay off when (a) the document count is below 200 and fits in a standard model context window, (b) queries are so rare that setup costs do not amortise, (c) documents are not digital yet and would first need OCR conversion, or (d) the answer should be creative, not from existing sources.

Concretely: a 3-person fiduciary with 80 active clients and 30 PDF queries per month likely will not amortise a RAG pilot. A contract generator or simple PDF search suffices. If all documents are scanned paper files without OCR, setup costs double due to the required OCR pipeline (see AI document recognition) – it can pay off, but is no longer a "pilot", it is a project.

Other bad cases: when documents change constantly (e.g. daily new versions) and no re-indexing automation is planned – then RAG answers with stale passages. When client structure requires separation (client A must not see client B) and no multi-tenant concept is planned – then the pilot is unusable. When data mapping is missing (e.g. unstructured storage with identical filenames in 12 folders), the prep work costs more than the actual RAG.

Trade-offs

STRENGTHS

Setup cost predictable: CHF 3,500-8,500 flat, no hidden engineering hours
Operating cost scales sub-linearly: 100x documents cost only 3-5x more
Cloud stack without hardware lock-in – start immediately, cancel anytime
Payback in 1-6 months typical under realistic usage assumptions

WEAKNESSES

OCR prep not included in pilot: scanned paper files double setup cost
When documents change constantly, re-indexing automation needed – extra 1-3 days
Multi-tenant (client A cannot see B): RBAC surcharge CHF 2,000-4,000 depending on complexity
Cloud embedding and cloud LLM leave Switzerland – for especially protected personal data not freely permitted

FAQ

What does a RAG pilot at fairlane.systems actually cost?

CHF 3,500 flat fee for a 4-week pilot with up to 5,000 documents, including data ingestion, chunking optimisation, audit trail setup, test suite, and training. Tier 2 (up to 25,000 docs, multilingual, multi-interface): CHF 8,500. Running operation is separate: typically CHF 50-180/month (cloud stack, depending on query volume).

Embedding cost – really negligible?

Yes. OpenAI text-embedding-3-small costs USD 0.02 per 1M tokens. 100,000 documents at 2,500 tokens = 250M tokens = USD 5. Even Cohere Embed-Multilingual-v3 (USD 0.10/1M) stays under USD 30 for initial indexing. Embedding cost is practically irrelevant for ROI.

Pinecone or Qdrant – what is cheaper?

Qdrant self-host is always cheapest (CHF 0-150/month depending on server class). Qdrant Cloud Starter begins at USD 25/month. Pinecone Standard starts at USD 70/month for a standard index. At 100,000 documents Pinecone is USD 500-1,200/month, Qdrant Cloud USD 200-500. Self-host Qdrant CHF 80-150/month. Recommendation: Qdrant self-host if you already have a Hetzner server, otherwise Qdrant Cloud.

When does the pilot pay back?

Rule of thumb: when the pilot saves at least 1.5 research hours per employee per day and 5 employees use it, that is 150h/month at internal CHF 100/h = CHF 15,000 saved. With a CHF 3,500 pilot flat fee plus CHF 60/month operation, payback is under one month. More realistically 2-3 employees and 0.5-1h/day – then 2-4 months payback.

Sources

OpenAI – Embedding & API Pricing (text-embedding-3-small, text-embedding-3-large) · 2026-05
Qdrant Cloud – Pricing (Starter / Standard / Enterprise tiers) · 2026-05
Pinecone – Serverless & Pod Pricing (Standard, Enterprise) · 2026-05
fairlane.systems – Service Pricing (AI-Audit, RAG-Pilot, n8n-Sprint) · 2026-05
Cohere – Embed v3 Pricing (Multilingual) · 2026-05

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call