fairlane.systems

CHROMA · TECH

Chroma: the simplest vector database for prototypes and notebooks

Chroma is an Apache-2.0 vector DB with a DuckDB backend. Python-API first, productive in ten minutes, good for prototypes up to 1M vectors. Not for production scale.

Researched & fact-checked by: · As of: 2026-05

What is Chroma?

Chroma (officially trychroma.com) is an open-source vector database under Apache 2.0, started in 2022 by Anton Troynikov and Jeff Huber. As of May 2026, version 0.5+ is current. Chroma positions itself explicitly as "the vector DB for LLM applications" and follows a Python-API-first approach: a pip install and three lines of code suffice to create a collection, store documents, and search semantically.

The architecture is built on DuckDB and Apache Arrow. Embeddings, metadata, and document text land in a DuckDB file; the vector index runs as HNSW based on hnswlib. This choice makes Chroma unusually lightweight – a standalone install needs no cluster, no etcd, no coordinator. One file on disk is enough.

Chroma can run in three modes: embedded (in-process in the Python process, no server), client/server (separate Chroma server via HTTP, multiple clients), and since 2024 Chroma Cloud (managed variant on AWS and GCP). Embedded is the most common choice in notebooks and prototypes – Chroma is used like SQLite, without network call.

The API is Python-centric, with client libraries also for JavaScript, Ruby, and Java. A REST API exists but is secondary in documentation. Embedding modules are built in: OpenAI, Cohere, HuggingFace, sentence-transformers – without an external provider, Chroma runs local models like all-MiniLM-L6-v2.

For Swiss fiduciary and SME setups, Chroma is the right choice in a clear profile: rapid prototypes, internal POCs, notebook-driven data analyses, small on-prem installations without scaling ambition. Anyone past 1M vectors or needing multi-tenant filters moves to Qdrant or pgvector.

Why it matters

The most common reason RAG projects in fiduciary offices fail is not the final architecture but the missing first experience. Before an office decides whether to run Qdrant, pgvector, or Weaviate in production, a staff member should have one hour to vectorise 100 client receipts in a notebook and search them semantically. Chroma reduces the entry barrier to the minimum.

Three properties make Chroma valuable for this initial stage. First: setup effort. pip install chromadb suffices; no Docker configuration, no YAML, no port forwarding. A first collection with embedded OpenAI or local MiniLM embeddings takes under 10 minutes. For a fiduciary teaching themselves the vector DB concept, this is the simplest path.

Second: integrated embedding modules. Chroma can accept text directly and convert it to vectors internally, with OpenAI, Cohere, or local sentence-transformers. This convenience saves building a separate embedding pipeline – a clear win for prototypes. In production setups, the value shifts; separation of embedding service and storage is usually better there.

Third: DuckDB persistence. The whole collection is one file. Backup is one copy command. Migration between machines is drag-and-drop. Anyone who builds a Chroma prototype and finds it insufficient for production can read embeddings via the get API and move them to Qdrant or pgvector – a straight path.

For Swiss fiduciary work with small datasets (e.g. an internal knowledge base of 5,000 receipts or a 500-PDF guideline collection), Chroma works in production too. Only once multi-tenant separation, higher QPS, or recall guarantees are demanded does a switch make sense.

How it works

The Python API is the main surface. A minimal workflow in embedded mode:

import chromadb client = chromadb.PersistentClient(path="./chroma_db") collection = client.create_collection(name="receipts", metadata={"hnsw:space": "cosine"}) collection.add(documents=["Invoice Müller CHF 200", "Gas station receipt CHF 80"], metadatas=[{"client": 42, "date": "2026-03-15"}, {"client": 42, "date": "2026-04-02"}], ids=["doc1", "doc2"]) results = collection.query(query_texts=["Müller receipts"], n_results=5, where={"client": 42})

The add function accepts either raw texts (Chroma computes embeddings internally) or finished embedding vectors. Metadata is a dict per document; filters on metadata are formulated with the where clause.

In client/server mode, Chroma runs as an HTTP server: docker run -p 8000:8000 chromadb/chroma. The Python client switches from PersistentClient to HttpClient(host="chroma-server", port=8000) – the rest of the code remains identical.

Filters are an important point. Chroma supports where filters on metadata ($eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or) and where_document filters on full text ($contains). But: Chroma filters post-k, i.e. the HNSW index returns top-k vectors first, then the filter is applied. With selective filters (e.g. "only client 42 of 200 clients") this leads to recall loss – the top-k list contains too few client-42 hits.

Workaround: one collection per client. This makes the filter implicit at the collection level and bypasses the post-k issue. At 200 collections, embedded mode still handles it; under higher load in client/server mode, Qdrant becomes the cleaner choice.

Backup in embedded mode is trivial: copy the ./chroma_db directory. In server mode, backup runs over the storage volume; snapshots work like any DuckDB file. Replication or multi-region distribution are not envisioned.

Chroma to production in 5 steps

  1. 01Choose mode: embedded for notebooks and small apps, client/server for multi-user access. pip install chromadb or docker run chromadb/chroma.
  2. 02Define the embedding function: local (sentence-transformers/all-MiniLM-L6-v2) or cloud (OpenAI, Cohere). For cloud embeddings, run a third-country TIA.
  3. 03Plan collections: under 50,000 vectors, one collection suffices; for multi-tenant, one per client because filters run post-k.
  4. 04Ingestion: collection.add(documents=[...], metadatas=[...], ids=[...]) in batches of 100-500. Define metadata fields clearly (date, client, confidentiality).
  5. 05Set up backup: in embedded mode via cron copying the directory; in server mode via volume snapshot. Recovery: drop directory back, restart service.

When to use Chroma

Chroma fits (a) prototypes and POCs below 1M vectors, (b) notebook workflows in data analysis or research, (c) embedded tools where a separate DB server is not wanted, or (d) internal SME knowledge bases with moderate load and no strict multi-tenant requirement.

Concrete cases: a fiduciary wants an internal knowledge base from tax circulars, VAT guidelines, and internal manuals – 5,000 PDF pages, one person searches, no multi-tenancy. Chroma in embedded mode runs on the local machine, in Docker on an internal server, or as an app bundle. Setup: 2 hours including full-text search and a simple web UI.

A second category: research or audit notebooks where a dataset is vectorised once and queried. Chroma can be used in a Jupyter notebook like pandas – no service management, no port configuration. Results land directly in the notebook.

Third category: demos and training. To show staff in 2 hours what embedding and semantic search mean, Chroma reaches the goal without detour.

For pure SaaS pilots without a self-host requirement, Chroma Cloud is available; pricing is usage-based as of May 2026. An EU region is not fully built out yet – for client data, the self-hosted variant is the clean choice.

When not to use

Once data volume exceeds 1-3M vectors or more than 100 concurrent queries are expected, Chroma sits at the edge of its capabilities. DuckDB as backend brings OLAP performance but is not built for high mixed read-write pressure. At this scale, switch to Qdrant, Weaviate, Milvus, or pgvector with HNSW.

For strict multi-tenant operation, Chroma is the wrong choice. The workaround via collection-per-tenant works up to about 50-100 tenants – beyond that, collection management itself becomes a burden. Qdrant scales here more cleanly because filters are evaluated in the HNSW graph.

If the team is Java or Go centric, Chromas Python focus is a friction loss. JS/Java clients exist but are less mature; bug reports and feature tickets address primarily the Python API.

For production setups with compliance requirements (audit trail, RBAC, fine-grained permissions), Chroma lacks the usual building blocks. Authentication in server mode has existed since 0.5 as an optional token scheme but is not as mature as in Qdrant or Weaviate.

Anyone needing hybrid search (BM25 plus dense) does not find it built into Chroma. Where_document with $contains makes a rudimentary full-text filter but is not a full BM25 score. For hybrid search, Weaviate, Elasticsearch, or pgvector-with-tsvector is the right choice.

Trade-offs

STRENGTHS

  • Simplest entry of all vector DBs – pip install and three lines of code
  • DuckDB persistence as a single file, trivial backup
  • Built-in embedding modules for OpenAI, Cohere, sentence-transformers
  • Embedded mode without server process, ideal for notebooks

WEAKNESSES

  • Scales stably only to 1-3M vectors – unsuited to production scale
  • Filters are evaluated post-k, recall loss on selective filters
  • Authentication and RBAC less mature than Qdrant or Weaviate
  • Python-centric, clients in other languages are secondary

FAQ

Up to what data volume does Chroma run stably?

Community experience: up to about 1M vectors with 384-dim embeddings on a 16 GB RAM machine works very well. Past 5M vectors, HNSW build becomes noticeably slow (several hours) and search latency rises. By that point, switching to Qdrant or pgvector with HNSW pays off.

Which embedding models run locally inside Chroma?

Chroma accepts any sentence-transformers model ID. Defaults are all-MiniLM-L6-v2 (384 dim, fast, acceptable quality) and all-mpnet-base-v2 (768 dim, better, slower). For German, paraphrase-multilingual-MiniLM-L12-v2 is worth it. With local models, data stays in-process; no third-country transfer needed.

How do I migrate from Chroma to Qdrant?

Read out all vectors and metadata via collection.get(include=["embeddings", "metadatas", "documents"]). A Python script writes them into Qdrant via upsert. Distance metric must match (Chroma default cosine = Qdrant cosine). Create payload indexes in Qdrant explicitly or filters will be slow. Effort for 500,000 vectors: 2-4 hours including verification.

Related topics

QDRANT · TECHQdrant: production vector database for RAG and semantic searchVECTOR DATABASES · COMPARISONVector databases compared: 10 options for RAG, search, and recommendationRAG · AI CONCEPTRetrieval-Augmented Generation (RAG): how AI answers from your own documentsEMBEDDINGS · AI CONCEPTEmbeddings and vectors: how language becomes mathematicsHYBRID SEARCH · AI CONCEPTHybrid search: BM25 plus vectors with reciprocal rank fusion in Elasticsearch, Qdrant, OpenSearchRAG ON YOUR OWN KNOWLEDGE · SERVICERAG on your own knowledge: answers from your documents – with sources, not made up

Sources

  1. Chroma documentation – concepts, embedded mode, server mode · 2026-05
  2. chroma-core/chroma – GitHub releases v0.5+ · 2026-05
  3. Chroma Cloud pricing and regions · 2026-05
  4. sentence-transformers – model documentation for local embeddings · 2026-04

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call