OWN MODEL · AI CONCEPT

How to train your own AI model? Fine-tuning, LoRA, QLoRA May 2026

Fine-tuning vs from-scratch vs RAG: for 95% of SMEs fine-tuning with LoRA/QLoRA is the right path. Hardware needs, cost USD 5-50k, May 2026 tools.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What does "training your own model" mean?

"Training your own model" is a term with three very different meanings often confused in practice. The differences decide cost, effort and probability of success for an SME.

Variant 1: from-scratch pretraining. Train a completely new language model from scratch. Cost May 2026: USD 50 million to 500 million for frontier models, USD 1-10 million for a 13-billion-parameter model. Staff: 30-100 ML engineers, data engineers, compliance specialists. Time: 6-24 months. Realistic only for tech giants (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek). Absolutely not for SMEs.

Variant 2: fine-tuning. Train an existing model (open-weight from Llama, Mistral, Qwen, DeepSeek) further on your own data. The model retains its language capability from pretraining and additionally learns the specific styles, vocabularies or task patterns of your domain. Cost May 2026: USD 5,000-50,000 for typical SME use cases. Staff: 1-2 ML engineers or external provider. Time: 2-8 weeks.

Variant 3: RAG (Retrieval-Augmented Generation). No training. The existing model is connected to your documents at answer time. The model does not learn from the documents, it reads them per request. Cost: USD 500-5,000 setup, USD 20-200/month operation. Time: 1-4 weeks. See retrieval-augmented-generation.

For 95% of SMEs the right answer is: first RAG, then fine-tuning, never from-scratch. That is the May 2026 consensus recommendation. Start with RAG (fast, cheap, brings 70-90% of value). When RAG is not enough (style issues, domain-specific language, specific task patterns), add fine-tuning. From-scratch is only for sectors with highly specific language (law in special languages, medicine with research terms) and only if you have a 30-person team and 10 million USD.

May 2026 fine-tuning tools (all open-source, free):

- Axolotl (OpenAccess-AI): YAML-based configuration, very popular for Llama, Mistral, Mixtral. Supports LoRA, QLoRA, full fine-tune, DPO. - Unsloth: speed-optimised, 2-5x faster than standard PyTorch training, 50-70% less VRAM. As of May 2026 the popular choice for single-GPU setups. - llama-factory: web UI, all mainstream models, many training procedures (SFT, RLHF, DPO, PPO). - Hugging Face TRL (Transformer RL): official Hugging Face library, standard for DPO and RLHF. - vLLM and Text Generation Inference (TGI): inference after training, highly optimised.

The tooling landscape is mature in May 2026: what was research code in 2023 is now production-ready with documentation, community support and best practices.

Why own model training matters for SMEs

Three specific occasions make fine-tuning relevant for Swiss SMEs.

Occasion 1: domain-specific language. Swiss fiduciary and legal vocabulary is underrepresented in pretraining. Terms like "Beistand", "Vermögens-Verwaltung", "Berufsvorsorge-Stiftung", "MWST-Quoten-Saldo-Methode" are understood by the standard model – but subtle application patterns (which clause matches which situation, which letter style is industry-customary) are not in the model. Fine-tuning with 500-2,000 examples from your own correspondence conveys these patterns. Result: the model writes cover letters in Swiss fiduciary style without long system prompts.

Occasion 2: brand voice. Whoever maintains a specific tone (e.g. "terse-dry-Swiss" or "warm-familiar-with-client-proximity") can only partially convey it with a system prompt. Fine-tuning on 200-1,000 own emails, reports, client answers internalises the style. Result: the model hits the house style with > 90% accuracy without sending a 500-token style prompt with every request. Saves tokens and improves consistency.

Occasion 3: specific task patterns. When you have a recurring task – e.g. "Create from this invoice a booking entry per our chart of accounts and add VAT code" – you can explain that with a system prompt. But with 300+ different booking patterns the prompt becomes unmaintainable. Fine-tuning on 500-2,000 example pairs (input: invoice + receipt photo, output: desired booking entry) trains the pattern into the model. Result: no long prompts, consistent output quality.

What fine-tuning does NOT solve.

- Cutoff date: fine-tuning on new data does not shift the pretraining cutoff for world knowledge. - Hallucination on factual questions: fine-tuning on 500 examples does not make the model a tax expert covering all cantons in all detail cases. - Insufficient data: whoever does not have 200+ qualitative training examples should collect data first, then fine-tune.

For all three occasions the combination fine-tuning + RAG is in May 2026 the strongest architecture: fine-tuning for style and task patterns, RAG for current facts.

Cost reality May 2026. Typical fine-tuning projects for Swiss SMEs:

- Style fine-tuning for fiduciary correspondence: 500-1,000 examples from own email collection. Llama 3.1 8B base. Hardware rental (RunPod, Vast.ai): USD 50-200 for the training run. Data preparation: 20-40 hours of staff effort. Plus eval suite and production setup. Total: CHF 3,000-8,000. - Task fine-tuning for bookkeeping triage: 2,000-5,000 examples invoice → booking entry. an upcoming Mistral Large generation base. Hardware rental: USD 300-1,000. Data buildup: 40-80 hours. Total: CHF 8,000-20,000. - Multi-task fine-tuning for client assistant: 5,000-15,000 examples from 5-8 tasks (email reply, receipt classification, dunning letter, meeting minutes). DeepSeek V3 or Llama 4 Scout base. Hardware rental: USD 1,000-3,000. Data buildup: 100-200 hours. Total: CHF 25,000-50,000.

These numbers cover engineering effort, not running operation. Inference costs after fine-tuning: comparable to base model.

Fine-tuning in detail

Four procedures dominate SME-relevant fine-tuning in May 2026: LoRA, QLoRA, full fine-tune, DPO.

Procedure 1: LoRA (Low-Rank Adaptation). Proposed 2021 (Hu et al.), default standard in May 2026. Instead of updating all model parameters, a small number of additional parameters (low-rank matrices) is added, and only these are trained. Typically: a 7-billion-parameter model gets 10-50 million LoRA parameters added – 0.15-0.7% of the full model. Advantages: 5-20x less VRAM, 3-10x faster, small LoRA artifact (50-500 MB instead of 14 GB), several LoRAs can be combined. As of May 2026 standard for 80% of all SME fine-tunings.

Procedure 2: QLoRA (Quantised LoRA). Proposed 2023 (Dettmers et al.). LoRA training on a quantised base model (4-bit or 8-bit). VRAM demand drops by another factor 2-4 over LoRA. As of May 2026 QLoRA enables fine-tuning a 13B model on a single A100-80GB GPU or even an RTX 4090 (24GB). Quality loss vs full LoRA: typically 1-3%, often negligible.

Procedure 3: full fine-tune. All model parameters are updated. Maximum learning potential but expensive and VRAM-intensive. For a 7B model: 4-8x A100-80GB or comparable H100 setup. For 13B+: multiple H100. As of May 2026 full fine-tune is used only for special cases – when LoRA quality is not enough or fundamentally different behaviour is to be trained in.

Procedure 4: DPO (Direct Preference Optimization). Instead of learning from input-output examples (that is SFT), DPO learns from "answer A is better than answer B" pairs. Mainly for style tuning and refusal behaviour. As of May 2026 standard procedure after Anthropics Constitutional AI for SME style adaptation. Data demand: 500-3,000 preference pairs.

Hardware needs May 2026. Realistic configurations:

- 7B model, 4-bit QLoRA: 1x RTX 4090 (24GB) or 1x A100-40GB. Training time for 1,000 examples: 2-8 hours. - 13B model, 4-bit QLoRA: 1x A100-80GB or 1x H100-80GB. Training time: 4-16 hours. - 70B model, 4-bit QLoRA: 2x H100-80GB or 4x A100-80GB. Training time: 12-48 hours. - Llama 4 Maverick (400B/17B MoE), QLoRA: 4-8x H100-80GB. Training time: 24-96 hours.

Cloud options for hardware rental May 2026.

- RunPod.io: A100-80GB about USD 1.50-2.50/h, H100-80GB about USD 3.50-5/h. Pay-as-you-go. - Vast.ai: marketplace, often 30-50% cheaper than RunPod but uncertain availability. - Lambda Labs: pro vendor, USD 2-4/h for A100, USD 4-6/h for H100. Persistent storage. - Hetzner Cloud GPU: H100 about EUR 3.50/h, datacenter in Falkenstein/Helsinki. EU/CH compliance relevant. - AWS, GCP, Azure: typically 30-100% more expensive than specialist vendors. Only sensible with an existing enterprise relationship.

Data preparation – the underestimated part. 60-80% of fine-tuning effort is data collection and preparation. Steps:

1. Data inventory: which emails, reports, receipts exist? Check volume. 2. Quality filter: use only production-ready examples, not "half-finished drafts". At least 200-500 high-quality pairs are better than 5,000 mediocre. 3. Format conversion: training data in JSONL with "messages" format (system prompt, user message, assistant message triples). 4. PII removal: client names, IBANs, AHV numbers removed or anonymised – otherwise the model stores them in parameters. 5. Split: 80% training, 10% validation, 10% test. Test set stays untouched for final evaluation. 6. Eval suite: 30-100 typical requests from your domain, manually annotated with ideal answers. With this you check before and after fine-tuning quality.

Hyperparameters May 2026. Standard defaults that almost always work:

- Learning rate: 1e-4 for LoRA, 2e-5 for full fine-tune. - LoRA rank (r): 16-32 for style, 64-128 for complex tasks. - LoRA alpha: typically 2*r. - Epochs: 1-3 for LoRA. More leads to overfitting. - Batch size: as large as VRAM allows. On 1x A100-80GB typically 4-16 for 7-13B models. - Gradient accumulation: raises effective batch size without more VRAM.

Eval and end-weight selection. During training the validation set is evaluated every few steps. The end weights are those with best validation performance – not necessarily the last. Standard procedure: run training, save checkpoint every 200-500 steps, pick the best at the end.

Train your own model in 5 steps

01Clarify variant: RAG (no training), fine-tuning (CHF 5-50k) or from-scratch (USD 50M+, only tech giants). For SMEs almost always RAG plus fine-tuning.
02Gather data: at least 200-2,000 quality example pairs from own practice. 60-80% of project effort is data preparation, not training.
03Select base model: Llama 4 Scout (109B/17B MoE, good DE competence, open-weight), an upcoming Mistral Large generation, DeepSeek V3, Qwen 3 – depending on language, task and hardware.
04Rent hardware: 1x A100-80GB or H100-80GB for 7-13B models (USD 1.50-5/h on RunPod, Hetzner, Vast.ai). 4-bit QLoRA is May 2026 standard.
05Build eval suite: manually annotate 30-100 test requests with ideal answers. With this check before and after fine-tuning quality, iterate.

When fine-tuning is the right choice

Four concrete SME scenarios for fine-tuning.

Scenario 1: style and brand voice. When you maintain a specific tone (Swiss-terse, lawyer-dry, fiduciary-formal, agency-casual) and cannot hit it consistently with system prompts alone, style fine-tuning is the right path. Data demand: 500-1,000 own examples of your correspondence. Result: model hits the style at 90%+ accuracy without long system prompt.

Scenario 2: domain-specific classification or triage. "Classify incoming email into 12 categories of our client workflow." Explaining with system prompts gets fragile with 12+ categories. Fine-tuning on 1,000-3,000 manually classified emails delivers > 95% accuracy without system-prompt overhead. Saves tokens per request and is more robust.

Scenario 3: structured output formats. When you want to extract specific JSON structures from free text (receipt data, contact data, contract clauses) and standard models do not follow the schema reliably, fine-tuning on 500-2,000 input-output pairs is effective. As of May 2026 often no longer needed – modern models (the current top Claude model, the current top GPT model) follow JSON schemas via strict mode reliably. But with very idiosyncratic schemas (e.g. SAP-specific booking fields), fine-tuning remains relevant.

Scenario 4: self-hosting for compliance. When revDSG, EU AI Act or professional secrecy forces you not to send client data to cloud APIs, you need self-hosting. Open-weight models (Llama 4, Mistral, DeepSeek V3, Qwen 3) offer a good base. With fine-tuning on own data you bring the model to sector-relevant level. Hosting on Hetzner GPU in Germany meets EU data residency.

Scenario 5: do not use – too little data. When you do not have 200+ qualitative training examples, fine-tuning is not the first step. First gather data (structure client correspondence, annotate receipts, have example pairs created), then fine-tune. Below 200 examples fine-tuning often produces worse results than the base model – overfitting on the few examples.

Strategic consequence. Fine-tuning May 2026 is affordable (CHF 3-50k) and technically mature. But it is no replacement for RAG. Best architecture for SMEs: fine-tuning for style and recurring task patterns, RAG for current facts, tool use for world access. This combination is the state of the art in May 2026.

When own training is not the right thing

Three cases against fine-tuning.

First: RAG already suffices. When your application needs factual knowledge from own documents (administrative rules, contracts, manuals), RAG is the faster, cheaper, lower-maintenance solution. Fine-tuning is superfluous here. Check: can your use case be covered with RAG? If yes, stop here.

Second: fact-update frequency is high. Fine-tuning fixes knowledge at training time. When your data changes monthly or yearly (tax rates, ordinances, price lists), fine-tuning is the wrong lever. Re-training on every update is expensive and slow. RAG with updated knowledge base is the right solution.

Third: too little qualitative data. Whoever does not have 200+ high-quality training examples builds an overfit model. Result: model answers training examples perfectly but off or confused on new requests. Below 200 examples, better not fine-tune.

Trap "we train our own model from scratch". Already explained: USD 50-500 million for frontier models, USD 1-10 million even for "small" 13B models. Staff: 30-100 specialists. Time 6-24 months. For SMEs absolutely unrealistic. Whoever says "we build our own" practically always means fine-tuning – if not, reality should be checked.

Trap "fine-tuning solves hallucination". No. Fine-tuning on 1,000 examples conveys style and task patterns but not factual fidelity for all tax detail questions. Hallucination is addressed via RAG (source binding), refusal prompt and citation checks, not via fine-tuning.

Trap "fine-tuning consumes no tokens". Yes it does – at inference. Fine-tuning changes nothing in token billing at later model use. Whoever self-hosts a 13B model saves API costs but has hardware operating costs. Whoever uses a fine-tuned model via cloud API (e.g. OpenAI fine-tuning) typically pays 2-3x more per token than the base model.

Trap "we do it once and are done". Fine-tuning is an iterative process. First version typically after 4-8 weeks, then 2-4 iteration rounds over 3-6 months until production maturity. Whoever plans a "weekend project" massively underestimates the effort.

Trade-offs

STRENGTHS

Style and task patterns are internalised – no long system prompt needed
For recurring tasks saves tokens per request
Self-hosting possible for compliance-critical Swiss applications
LoRA/QLoRA in May 2026 affordable (CHF 5-50k) and technically mature

WEAKNESSES

Data preparation is 60-80% of effort, often underestimated
Fact updates need re-training or RAG complement
Catastrophic forgetting possible under aggressive training
Below 200 examples often worse than the base model

FAQ

What does fine-tuning realistically cost for an SME?

CHF 5,000-50,000 in May 2026 depending on task and model size. Style fine-tuning for fiduciary correspondence (Llama 3.1 8B + 1,000 examples): CHF 3-8k. Task fine-tuning for bookkeeping triage (an upcoming Mistral Large generation + 3,000 examples): CHF 8-20k. Multi-task for client assistant (DeepSeek V3 + 10,000 examples): CHF 25-50k. Covers engineering, hardware rental, data buildup and eval. From-scratch training for frontier models stays out of reach for SMEs (USD 50-500 million).

How many training examples do I need minimum?

Rule of thumb: 200+ for first try, 500-2,000 for production quality, 5,000+ for multi-task models. Data quality matters more than quantity – 200 very clean examples often yield better results than 5,000 mediocre ones. Below 200 one should not fine-tune; overfitting risk (model memorises examples instead of generalising) is too high. Source: Hu et al. LoRA paper (2021), confirmed by May 2026 community practice.

Which base model should I choose May 2026?

For DE competence and self-hosting: Llama 4 Scout (109B/17B MoE), an upcoming Mistral Large generation (closed-weight, API fine-tuning), Qwen 3 (open-weight, very strong multilingual). For smaller hardware: Llama 3.1 8B or Mistral 7B. For code tasks: Codestral, Qwen 2.5 Coder or DeepSeek Coder. For pure cloud API fine-tuning (no self-hosting needed): OpenAI fine-tuning on GPT-4o-mini or the current top GPT model-mini is the fastest convenience option in May 2026.

Does fine-tuning make the model lose its original capabilities?

Partially – called "catastrophic forgetting". Under aggressive fine-tuning (too many epochs, too high learning rate) the model loses general capabilities in favour of the training task. May 2026 standard procedure minimises this: low learning rate (1e-4 for LoRA), few epochs (1-3), small LoRA share of total model (0.15-0.7%). LoRA solves the problem structurally – original weights stay unchanged, only LoRA adapters are trained. If you switch off the adapter, the original model is back.

Sources

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call