TOOL USE · AI CONCEPT

What is tool use and function calling? LLMs invoking external APIs May 2026

Tool use turns a language model into an agent: the model structurally invokes external APIs – calculator, database, web search. May 2026 standard with MCP protocol.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is tool use?

Tool use, also called function calling, is a language model capability to structurally invoke external functions. Instead of solving every task itself (mathematics, database query, weather lookup, web search), the model recognises from the user prompt: "Here I need a tool" – and returns a structured function call (typically in JSON format): "Call the function `lookup_invoice` with parameters customer_id=12345 and month=2026-04". The application runs the call (e.g. database query), returns the result to the model, and the model formulates the final answer to the user with the real data.

This is the basis of modern AI agents (see was-ist-ai-agent). Without tool use a language model is just a text generator. With tool use it becomes an employee querying databases, creating documents, scheduling appointments and running calculations. As of May 2026 this capability is standard at all serious LLMs:

- OpenAI Tools (the current top GPT model, o3): parallel tool calls (multiple tools simultaneously), streaming tool calls (tool-call definition becomes incrementally visible), strict mode (guaranteed JSON-schema compliance). - Anthropic tool_use (the current top Claude model): similarly structured, in May 2026 with parallel tool calls and Computer Use extension (Claude controls screen). - Google Gemini function_calling: functions are declared in the generation config block, very close to OpenAI format. - Mistral function_calling (an upcoming Mistral Large generation, Codestral): JSON mode plus function-schema description. - Open-source (Llama 4, Qwen 3, DeepSeek): function-calling capability built in, operable with library wrapper (LangChain, LlamaIndex).

A new standard in May 2026 is MCP (Model Context Protocol), introduced by Anthropic in 2024 and adopted by OpenAI, Mistral and several IDEs since early 2026. MCP is an open protocol: an MCP server exposes a collection of tools (e.g. "Bexio bookkeeping", "Stripe payment", "Confluence documentation"), every MCP-capable client (Claude Desktop, OpenAI Apps, Cursor IDE) can use them. This avoids vendor-specific tool definitions and makes tool integrations reusable.

For SME users the most important consequence is: tool use is the bridge between LLM and real business data. Whoever builds a bookkeeping agent that creates documents via Bexio API and checks payment status via Stripe API needs tool use. Without this capability the LLM remains a "clever wordsmith without world access".

Why tool use matters for SMEs

Tool use touches SME workflows in four concrete areas.

First: reliable arithmetic instead of LLM maths. Language models do not calculate reliably (see wie-funktioniert-ein-llm). On VAT calculation, multi-year discounting, compound interest they err in 5-15% of cases. With tool-use-bound calculator (Python eval, Wolfram Alpha, custom function `calculate_vat`) this is solved: the model does not write the calculation into output but calls the calculator. Result: 100% correct calculation, provided the tool definition is clean. Fiduciary applications in which clients request tax-plan calculations should in May 2026 mandatorily use tool use.

Second: realtime database queries. Whoever builds a client chatbot to answer "What is the current bookkeeping status of Müller AG?" needs a connection to the Bexio/Abacus/Sage API. That does NOT work without tool use: the model recognises the request, calls `get_account_status(client_id="müller_ag")`, gets the result, answers with it. Before 2023 this was cumbersome to implement in own code – in May 2026 the function is realised in 50 lines of Python (OpenAI SDK plus own function).

Third: multi-step workflows without human in the middle. Whoever wants to send a dunning letter needs typically 4-6 steps: identify client, check outstanding invoice, determine dunning level, generate letter, send email, note in CRM. With tool use an agent can run these in a connected call pattern – typically 8-15 seconds total latency. The human sees only the end result ("dunning sent") and can intervene when the model signals uncertainty. See ai-mahnwesen-automation.

Fourth: web search overcomes the pretraining cutoff. When the model has cutoff January 2026 and you ask about current tax changes in 2026, it knows nothing. With web-search tool (Anthropic Brave Search Integration, OpenAI Browse Tool, Tavily API, Perplexity API) the model can fetch current web pages and integrate them into the answer. That shifts the cutoff problem effectively to "always current" – provided the search sources are trustworthy.

Cost May 2026. Tool use itself is free of charge – the model bills no extra token fee. But every tool call has its own cost: API calls to Bexio (typically CHF 0.005-0.05 per call depending on plan), web search (Tavily USD 0.005, Perplexity USD 0.005-0.02), database query internal (negligible at own DB). Plus the tokens for the tool-definition block in the prompt (typically 100-500 tokens per declared function).

Strategic consequence. Tool use is not "nice to have" but the bridge between LLM and real SME work. Whoever plans an LLM workflow in May 2026 that needs real data access must plan with tool use from the start. Architecture without tool use is architecture without business connection.

Tool use in detail

A tool-use interaction breaks into 5 steps and involves three parties: user, LLM, application code with tool implementations.

Step 1: tool definition. In the API request the application passes a list of available tools. Every tool has: name, description (natural language: "Fetches the balance of a Bexio account"), input schema (typically JSON Schema with required fields and types). Example:

```json { "name": "get_bexio_account_status", "description": "Fetches the current balance of a Bexio bookkeeping account", "parameters": { "type": "object", "properties": { "client_id": { "type": "string", "description": "Bexio client ID" }, "account_number": { "type": "string", "description": "Account no (e.g. 1000)" } }, "required": ["client_id", "account_number"] } } ```

Step 2: model decides. The model reads the user prompt and the tool list. It decides whether one or more tools make sense. If yes, it returns in the output instead of text a structured tool-call block – typically JSON or a special XML form depending on vendor. If no, it answers directly in text.

Step 3: application executes. The application parses the tool call, validates arguments, calls the real function. That is NOT an LLM call – it is normal application code (Python, Node, Go). Important: the application must authenticate, authorise, validate – the model is not in the path here, hence not in the security check.

Step 4: result back to model. The tool result (typically JSON) is returned to the model as a "tool_result" message. The model now sees: "You asked for Bexio, here is the answer: {balance: 12450.50, currency: CHF, last_updated: 2026-05-22}".

Step 5: final answer. The model phrases the final answer to the user with the real data: "Account 1000 currently has CHF 12,450.50 as of 22 May 2026". In multi-step workflows the model can make several tool calls one after another or in parallel before producing the final answer.

Parallel vs sequential tool calls. May 2026 standard at OpenAI and Anthropic: the model can request several tool calls in parallel when they are independent. Example: "Check Bexio account and Stripe payments" → the model calls both at once, the application runs in parallel, results back. Saves latency markedly (often factor 2-4 with multiple tools).

MCP (Model Context Protocol) May 2026. Anthropic standard from 2024, broadly adopted. Instead of implementing every tool in application code, an MCP server declares a tool collection. An MCP-capable client (Claude Desktop, OpenAI Apps, Cursor IDE, custom apps) connects to the server and uses the tools immediately. Advantages: reusable tool collections, no vendor-specific adapters, unified auth model. As of May 2026 there are over 1,500 public MCP servers: Bexio, Stripe, Notion, Confluence, GitHub, Slack, Postgres DB, Wolfram, Brave Search, Sentry, and many more. See was-ist-mcp.

JSON-Schema strictness. Earlier (2023) tool calls were error-prone – the model produced JSON with typos or missing fields. May 2026 standard "strict mode" (OpenAI), guaranteed JSON-schema compliance. Anthropic the current top Claude model has a comparable feature via constructive tool-schema description. Practically: tool-call error rate dropped from about 5% (2023) to < 0.5% (May 2026).

Error handling. What happens when a tool call fails (DB down, API auth error, timeout)? The application returns an error string as tool_result. The model sees the error and decides: try again, take another tool, or inform the user "Bexio is currently unreachable". Clean error paths are mandatory in any production tool-use application.

Understand tool use in 5 steps

01Understand the principle: the LLM calls functions instead of knowing everything itself. Data and actions come from real APIs.
02Define tools cleanly: name, description, JSON schema for parameters. Clarity of description is the main factor for call accuracy.
03Check MCP servers: as of May 2026 1,500+ public servers are available (Bexio, Stripe, Notion, GitHub, Postgres). Do not build yourself where already there.
04Build a secure validation path: every tool is an API gateway. Auth, validation, rate limit in application code.
05Test tool calls with real requests: run 20-50 typical SME requests across all tools, check tool-call accuracy and error paths.

When to use tool use

Five concrete SME scenarios for tool use.

Scenario 1: database-driven client requests. Client asks "What is my outstanding amount?". Answer needs live DB data – tool use with get_open_amount(client_id) is the clean solution. Without tool use the application would have to embed data into the prompt every time (RAG-like pattern), or the client gets stale answers.

Scenario 2: calculations with safety. VAT rates, multi-year discounting, payroll detail, compound interest. Tool use with own calculate function (Python code) or Wolfram Alpha binding gives 100% correct results. Whoever wants to safeguard fiduciary advice against LLM hallucination MUST use tool use here.

Scenario 3: multi-step booking workflows. "Create an invoice for Müller AG over CHF 4500 for work done last month and send it via email." The model calls sequentially: create_invoice() → get_email(client_id="müller_ag") → send_email(to, body), each step with real system effect. Latency typically 8-20 seconds, comparable to human processing – but 0 manual clicks.

Scenario 4: web search for currency. "What are the current Swiss tax changes 2026?" Pretraining knows nothing after Jan 2026 (the current top Claude model). Tool use with Tavily or Brave Search fetches live pages, the model synthesises. Important: source listing in output so the user can verify.

Scenario 5: MCP server for reusable integrations. When you have 5 different apps all speaking with Bexio – no need to reimplement 5x. Instead: one "Bexio" MCP server, all apps consume it. As of May 2026 there are 1,500+ public MCP servers for common SaaS tools. Own MCP servers for own backends are written in 50-200 lines of code.

Scenario 6: Computer Use for legacy software. Anthropic the current top Claude model has Computer Use since 2024 – the model sees the screen via screenshot, clicks mouse, types keyboard. For legacy software without API (old bookkeeping programs) this enables tool use via UI control. Latency and error rate are higher than API tool use, but for workflows without API alternative it is the only option in May 2026.

When tool use is not the right thing

Three cases against tool use.

First: pure language tasks without world access. Email reply generation, contract clause check, content summary, meeting-minute formatting. These tasks need no external API – the model has all data in the prompt. Tool use here would be overcomplex.

Second: applications needing high security guarantees without a clean validation path. A tool call is only as secure as the application running it. Whoever declares "delete_invoice(id)" as a tool without permission check in application code builds a security gap. Tool use requires engineering discipline: every tool is an API gateway and must be secured accordingly. Whoever cannot deliver that should not build a tool-use application.

Third: realtime requirements below 1 second. Tool use raises latency: the model must generate the tool call (300ms-2s), the application runs it (variable), results back (300ms-2s extra), final answer generation. Several tool calls in a row often 5-15 seconds total. Too slow for sub-second voicebots – only well usable with realtime streaming API.

Trap "tool use makes the model intelligent". Tool use makes the model access-capable, not more intelligent. A stupid model connected to Bexio API only gives "stupid answers with real data". Tool use solves data-access problems, not understanding problems.

Trap "we build all tools ourselves". As of May 2026 with MCP there are 1,500+ pre-built tool collections. Whoever builds a Bexio MCP server themselves because "surely quick" typically wastes 2-4 weeks of engineering – better: check existing MCP servers, possibly extend one, instead of building from scratch.

Trap "tool definition is detail work, the intern does that". Wrong – the tool definition (description, parameters, schema) determines how reliably the model hits the tool call. A vague description ("fetches account data") leads to wrong calls. Clear description with examples ("Fetches Bexio account balance for client_id `müller_ag` and account_no `1000`. Returns balance in CHF and last_updated as ISO date") leads to correct calls. Tool definitions should be documented with the same care as an API contract.

Trade-offs

STRENGTHS

Bridge between LLM and real business data (DB, API, web)
Improves accuracy dramatically on calculations and data queries
As of May 2026 standardised with MCP – 1,500+ reusable tool servers
Parallel tool calls reduce latency with multiple tools

WEAKNESSES

Security overhead: every tool is an API gateway with auth requirement
Latency rises: 5-15 seconds in multi-tool workflows
Tool definition quality determines call accuracy – vague description = errors
Vendor APIs differ – abstraction via LangChain/SDK needed

FAQ

What is the difference between tool use and function calling?

Synonyms. OpenAI calls it function calling (tools in the API schema are "functions"), Anthropic calls it tool use ("tool_use" message type). Conceptually the same: the LLM structurally invokes external functions. Mistral, Google Gemini, DeepSeek, Qwen mostly use "function_calling". As of May 2026 the APIs are not 100% compatible – LangChain, LlamaIndex and Vercel AI SDK abstract the differences.

What is MCP and why is it important in May 2026?

Model Context Protocol, introduced by Anthropic in late 2024. An open protocol: an MCP server exposes tools (Bexio, Stripe, Postgres, Confluence), every MCP client (Claude Desktop, OpenAI Apps, Cursor IDE) uses them. May 2026 standard, adopted by OpenAI and Mistral. Advantages: reusable tool collections without vendor-specific adapters. 1,500+ public MCP servers available. See was-ist-mcp for details.

How secure are tool calls?

Only as secure as the application running them. The LLM only produces the tool call (typically JSON); the application MUST implement auth, permissions, input validation and rate limits itself. Whoever declares "delete_invoice(id)" as a tool without auth builds a security gap. Best practices: whitelist tools (no "exec_arbitrary_code"), JSON-schema validation before execution, per-user permissions, audit log of all tool calls.

Is an own MCP server worth it?

Yes if you want to make own backends (CRM, ERP, internal DB) accessible to LLMs. Effort: typically 50-300 lines of TypeScript or Python per tool collection. Advantages: implemented once, all apps and IDEs (Claude Desktop, Cursor, own web apps) can use it. If you have only a single application, the wrapping often does not pay off – simple tool definition directly in the API call suffices.

Sources

OpenAI – Function Calling Guide and Strict Mode Reference · 2026-05
Anthropic – Tool Use with das aktuelle Claude-Spitzenmodell · 2026-05
Anthropic – Introducing the Model Context Protocol (MCP) · 2024-11
Google – Gemini Function Calling Documentation · 2026-04
MCP Hub – Public Server Directory · 2026-05

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call