AI AGENT · AI CONCEPT
What is an AI agent? ReAct, tool use and production patterns May 2026
An AI agent is an LLM system that calls tools itself, plans, and acts in multiple steps. Patterns May 2026: tool use, ReAct, LangGraph.
Researched & fact-checked by: DuneDive LLC · As of: 2026-05
What is an AI agent?
An AI agent is a software system in which a language model controls the flow. The model decides not only what it answers – it decides which tools to call, in what order, when enough information is gathered, and when a task is finished. Classical software is the reverse: code controls, model answers. In an agent the model controls, code provides tools.
The term has been established since mid-2023, with the ReAct paper (Yao et al., 2022) as the theoretical foundation. ReAct stands for "Reasoning + Acting" – the model alternates between thinking (reasoning) and acting (via tool calls). In 2024 the pattern became production-ready through native tool-use support in the big providers: Anthropic Claude Tool Use (May 2024), OpenAI Function Calling and Assistants API, Google Gemini Function Calling, Mistral Tool Use.
In May 2026 the agent landscape is structured: simple tool-use flows with the provider SDKs directly, more complex multi-step flows with frameworks like LangGraph (LangChain), Anthropic Computer Use (controls screen + mouse), OpenAI Agents SDK, AutoGen (Microsoft), CrewAI (multi-agent teams). The question is no longer "can one build agents", but "where do they pay off and where do they become risk".
Why it counts – and why it is delicate
Agents solve three problems that classical software cannot. They also create three new ones that a fiduciary/legal SME must know.
What agents solve. First: variable workflows. A client onboarding has 4-12 steps depending on client type. Classical code must hard-code every branch – with 10 client types quickly unmaintainable. An agent navigates the process dynamically. Second: unstructured input. An email with an attached PDF and a hidden request in paragraph 3 – an agent can triage; classical code cannot. Third: tool composition. An agent can combine Bexio API + email + calendar + knowledge base without the combination being hard-coded.
What agents create as problems. First: uncontrolled behaviour. When the model decides, it can decide wrongly. Loops (the agent calls the same tool 50 times), hallucination-driven actions ("I deleted this client" – the client does not exist), unexpected tool combinations. Second: audit problem. A classical software action is documented in code – it runs because line 247 says so. An agent action is documented in the prompt output – it runs because the model suggested it. For audit-ready bookkeeping (Art. 957a CO), DPIA documentation (Art. 22 revFADP) and EU AI Act Art. 26 deployer logging, the second form is harder to verify. Third: cost explosion. An agent that calls the model three times per request (reasoning, tool call, answer synthesis) costs 3-10x more than a single call. In an endless loop, daily costs of CHF 1000+ in hours are possible.
May 2026 the industry is in the "reality phase". The 2024-2025 hype wave brought many agents into production; several ended in expensive incidents (data deletion, wrong bookings, compliance breaches). Best practices have consolidated: tools with hard limits, human-in-the-loop on critical actions, audit trail of every step, eval suite against regressions.
Mechanics and patterns
A typical agent cycle has four steps that repeat until the goal is reached or an abort criterion triggers.
Step 1 – system prompt with task and tool catalogue. The model receives a clear task ("Answer the client inquiry. You may use these tools: search_documents, fetch_bexio_balance, send_email") and a structured tool catalogue (name, description, input schema, output schema).
Step 2 – reasoning. The model considers: what must I do, which tool fits. In Claude and GPT-4 this is often embedded in the tool call itself; in smaller models you see explicit "thinking" blocks.
Step 3 – tool call. The model calls a tool as a structured JSON call. The surrounding agent code executes the tool (API request, DB query, file read) and returns the result as a message.
Step 4 – iteration or finish. The model decides: another tool needed, or answer ready? When ready, the final answer; otherwise next cycle.
Classical patterns May 2026.
ReAct (Yao et al. 2022). The patriarch. Model thinks in natural language and calls tools. Today mostly implicit through the provider SDKs.
Plan-and-Execute. The model makes a complete plan first, then executes step by step. More robust for long tasks, slower.
Reflexion / self-critique. After each step the model reviews its own result and corrects if needed. Reduces hallucinations but costs tokens.
Multi-agent. Several agents with different roles (researcher, writer, critic) collaborate. CrewAI and AutoGen popularise this. In practice often overkill – a well-built single agent suffices in 80% of SME cases.
LangGraph (LangChain). A graph instead of a loop. Nodes are steps (LLM call, tool call, branch, loop-back); edges are transitions. May 2026 the most advanced open-source framework for production agents.
Anthropic Computer Use (October 2024+). The agent directly drives screen and mouse – it sees the screen, clicks, types, scrolls. In May 2026 still a specialised application with high error rate, but the only option for UI automation without API access.
Production hardening. Token limits per conversation, tool-call limit per conversation (e.g. max 10 tool calls), timeout per tool call, clear refusal instructions ("if the tool fails, say so"), human-in-the-loop for any action with irreversible effect (delete, send, book, pay), structured audit logging of every step into a central audit DB.
Building an agent in 7 steps
- 01Sharpen the task: goal, allowed actions, forbidden actions, success and abort criteria.
- 02Check whether an agent is needed at all: does a simple tool-use call or classical workflow suffice? If yes, do not build an agent.
- 03Define the tool catalogue: each tool with clear input/output schema, hard limits (money amounts, deletion scope), timeout, idempotency where possible.
- 04Safety layer: tool-call limit per conversation, token limit, thresholds for human-in-the-loop (e.g. any payment > CHF 100 must be approved).
- 05Audit logging to a central DB (Postgres, Loki): per step timestamp, model call, tool call, input, output, decision, human approval.
- 06Eval suite: 30-100 real scenarios with expected behaviour. Run before every model switch and every prompt update.
- 07Phased rollout: first shadow mode (agent suggests, human always decides), after 2-4 weeks without incident partial autonomy for non-critical actions.
When an agent fits
Four application profiles where an agent is sensible in May 2026.
Profile 1: triage and routing. A client email arrives; the agent classifies (VAT question, payroll question, contract question), pulls matching sources from the knowledge base, drafts an answer. Sender email, classification decision and drafted answer go to a human for approval. No irreversible step without human confirmation.
Profile 2: data gathering from multiple sources. Before a client meeting the fiduciary needs: current Bexio balance, the last five reminders, the client file from the DMS, the VAT status. An agent calls the four APIs and produces a briefing. Classical code could do this too, but every new data source grows the code linearly; with an agent only the tool inventory grows.
Profile 3: workflow with variable branches. AML onboarding of a new client: by country, sector and business model the duties branch out. An agent navigates dynamically, asks for the needed information, checks it against external databases. For critical actions (creating the client, saving the risk grade) always human-in-the-loop.
Profile 4: unstructured-to-structured conversion. Incoming invoices, contracts or correspondence are read by the agent, classified, entered into the internal schema. With OCR for paper documents (see ai-belegerkennung-ocr). Important: for bookings ALWAYS a human review step – Art. 957a CO and tax risk forbid direct auto-booking.
Where an agent is NOT the answer. When the task is deterministic and stable: classical code. When the task is safety-critical (payout > CHF 5000, contract conclusion, client deletion): at minimum human-in-the-loop, often no agent at all. When the task is rare (1-2x per month): manual handling faster than agent development. When hallucinations are unacceptable: RAG with strict refusal policy, no agent.
When NOT an agent
Three clear cases where agents cause harm.
First: irreversible actions without supervision. Delete, pay, send, post. An agent that moves money or deletes data without human-in-the-loop is a programmed compliance breach. In May 2026 several published incidents show insurance/legal setups falling into five-figure damages through wrong agent actions.
Second: applications under strict determinism. A VAT calculation MUST produce the same output for the same input every time. An agent is non-deterministic – the same input can yield different tool calls and answers. Such tasks belong in classical code (Math.js, an Excel engine, your own rule engine). The language model may at most sit in front or after as a checker.
Third: highest data sensitivity without audit clarity. An agent working with client data must log every step: which prompt, which tool call, which output, which decision. This audit trail must be designed in advance and enforced in the pipeline – not "if the developer forgets, the agent skips it". Without that audit guarantee, professional secrecy (SCC 321) and revFADP DPIA are at risk.
Market observation May 2026. The industry increasingly distinguishes "agentic AI" (real autonomy, multi-step, own plan generation) from "AI with tool use" (one request, one function, one answer). The second covers 80% of SME tasks and is markedly safer. Anyone commissioning or building an "agent" should first ask: do we really need multi-step autonomy, or does a tool-use call suffice? The answer is often the second.
Trade-offs
STRENGTHS
- Variable workflows without hard-coded branches
- Unstructured inputs become processable
- Tool composition changes without code change
- Fast prototyping possible (days instead of weeks)
WEAKNESSES
- Non-deterministic – same input, different actions possible
- Cost explosion with loops or unbounded tool use
- Audit trail must be built explicitly – otherwise compliance risk
- Hallucination-driven actions can cause irreversible damage
FAQ
How many tool calls per agent answer are normal?
In May 2026 typically 2-6 for SME use cases. Triage apps often 1-2 (classify + fetch source). Data-gathering agents 4-8 (querying several APIs). More complex LangGraph flows can reach 10-30 but then need hard limits against endless loops. Rule of thumb: more than 10 tool calls per answer is an architecture question, not a tuning detail.
Do I need LangGraph or is the provider SDK enough?
For 70% of SME apps the provider SDK suffices (Anthropic SDK, OpenAI SDK directly) plus your own loop code with audit logging. LangGraph pays off when you have complex branching logic, parallel sub-agent spawning, persistent conversation state across sessions, or retry-with-correction loops. In May 2026 LangGraph is the standard for production agents in corporate environments; provider SDK plus own loop is the pragmatic SME path.
Which model is best for agents in May 2026?
Claude Sonnet and Opus dominate for complex multi-step agents – Anthropic has invested heavily in tool-use quality and refusal behaviour. GPT-4.1 is close and at times faster. Gemini 2 is price-attractive and strong in Google Cloud settings. Smaller models (Claude 3.5 Haiku, GPT-4o-mini, Llama-3.1-70B) suffice for simple tool-use flows and are 5-10x cheaper. Rule of thumb: start with the big model, downsize after functional verification.