YEAR-END QA · USE CASE

AI-supported quality assurance for the annual financial statement

Detect anomalies in journal entries before the audit: Benford test, balance comparison, accruals. The human decides, the AI flags.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is AI QA in the annual closing?

Quality assurance of the books precedes every audit. A Swiss fiduciary office typically spends 60 to 80 hours per mid-size client between February and April on voucher checks, balance reconciliation, and plausibility. AI-supported QA shortens that work by automating sample selection and flagging anomalies that a human then reviews.

In concrete terms: a script pulls the client general ledger (Abacus, Bexio, Banana, SAP, Sage), computes per-account and per-period statistical metrics, compares them against prior year and industry, and hands suspicious records to a language-model classifier that drafts an initial explanation. The output is a prioritised list with three columns: entry, anomaly, suggested audit step. This list lands on the fiduciary desk – not in the outbox, not in the audit report.

The use-case is explicitly NOT fully automatic. Language models cannot determine PS 240 materiality thresholds on their own, and the Swiss Code of Obligations requires the signature of a licensed audit expert on the report. The AI is pre-reviewer, not reviewer.

Why it matters

Three reasons make the case. First: sample selection is often arbitrary today, or follows a fixed quota ("every 50th voucher"). An AI pre-reviewer picks by risk – year-end entries, unusual offset accounts, round-thousand amounts. Sample rate goes down, hit rate goes up.

Second: Benford tests, balance consistency checks, and accrual plausibility are mathematically well-defined but manually tedious. Tools like ACL and IDEA have offered this for years – but they are expensive and rarely licensed in mid-size fiduciary firms. An open Python script with Pandas and an LLM wrapper delivers 80 percent of the value at a fraction of the cost.

Third: documentation. Every audit step is logged automatically (see Art. 957a CO audit trail). When the auditor later asks why a given entry was not in the sample, there is an answer with timestamp, model used and prompt. That is not just compliance – it is defence against liability claims.

The EXPERTsuisse audit standards (PS 240, auditor responsibility for fraud) have since 2024 explicitly recognised data-analytic procedures as part of "further audit work". Whoever runs AI QA is not outside the norm – they are inside modern audit practice.

How it works

The pipeline has five stations.

Data extract: The client general ledger is pulled via ERP interface – Abacus REST API, Bexio API, Banana XML export, SAP via RFC. We recommend a read-only connection and copying the book into a separate audit database (DuckDB local, PostgreSQL on-prem). Original data stays untouched.

Statistical pre-check: Mathematical tests run on the audit copy. The Benford test checks leading-digit distribution in expense accounts; significant deviations indicate possible manipulation. The balance comparison computes prior versus reporting year per account and flags differences above the materiality threshold (typically 1 percent of balance-sheet total or 5 percent of net profit, per PS 320). Accrual checks look for entries on 31 December and 1 January that are unusually large or unusually round.

RAG lookup of industry benchmarks: A vector database holds materiality thresholds, industry benchmarks (Swiss Fiduciary Association, KOF data) and internal guidelines. For each suspicious entry, the matching rule is pulled and supplied to the language model as context.

LLM classification: The model receives entry plus rule plus benchmark and formulates an initial assessment: "Entry 4400/1100 of CHF 42,000 on Dec 30, prior-year balance 4400 was CHF 8,000. Anomaly score 0.87. Suggestion: check whether this is an accrual; request voucher." We recommend Claude Sonnet or GPT-4.1 for classification, routed via LiteLLM. Mistral Large locally for mandates under professional secrecy.

Human decision: The fiduciary receives a list with three columns: entry, anomaly reasoning, suggested audit step. She decides whether to check, request a voucher, or dismiss the flag. Every decision is logged in the audit trail.

NOT fully automatic: the model writes nothing into the audit report, sends nothing to the client, issues no audit opinion.

QA workflow in 6 steps

01Mandate onboarding: set up read-only ERP access, define chart of accounts and materiality (PS 320), load prior-year general ledger into the audit database.
02Start statistical pre-check: Benford test on expense accounts, prior-vs-reporting balance comparison, accrual search around year-end.
03Provide RAG context: index industry benchmarks (KOF, fiduciary chamber), internal guidelines and PS standards.
04Run LLM classification: for each anomaly produce an assessment with anomaly score, reasoning and audit suggestion. Save audit trail with model, prompt and timestamp.
05Human review: the fiduciary screens the top anomalies and decides per entry (check, request voucher, dismiss). Each decision is logged.
06Final report: the list of performed audit steps, findings, and justification for non-tested entries goes into the audit working paper (PS 230).

When to use

AI QA fits mandates with a high entry count (from ca. 5,000 entries/year), a clearly structured chart of accounts (Swiss SME chart, Swiss GAAP FER) and a stable business model. Trade, hospitality, construction and industry deliver reliable anomaly signals because the industry benchmarks are well documented.

The method also works well for recurring audits: when the prior year was audited, the comparison has weight. First-time audits require more manual preparation because there is no baseline.

Likewise useful for quarter- or half-year closings – the pipeline runs incrementally, each entry block is reviewed once, the sample is not rebuilt from scratch each time.

In combination with audit planning (see audit SOP): the AI QA list is input to the risk-assessment phase, not a replacement.

When not to use

Not suitable for first-time mandates with chaotic books – the books must be cleaned up before statistical tests make sense. Holding structures with complex consolidation also defeat standard tests, because entry distribution reflects the consolidation logic, not the operating business.

Do not use for very small books (under 500 entries per year) – the effort for data extract and pipeline configuration is out of proportion to the benefit.

Also do not use when the mandate is under special data protection (client list at a private law firm, fiduciary mandates with Art. 321 SCC professional-secrecy protection) and you can run the language model only in a vetted on-premise configuration. Without a local hosting option (Mistral, Ollama with Llama 3.1) AI QA is not permitted in this segment.

The AI does NOT replace the audit step itself. Turning an anomaly flag into an audit finding without inspecting the voucher violates PS 500 (audit evidence).

Trade-offs

STRENGTHS

Sample selection becomes risk-weighted instead of arbitrary – hit rate rises significantly
Benford test and balance consistency without ACL/IDEA licence at a fraction of the cost
Audit trail per Art. 957a CO automatic – every audit step timestamped
Scalable: runs incrementally per quarter, not only at year-end

WEAKNESSES

First-time mandates without prior-year baseline yield weak anomaly signals
Holding structures with complex consolidation defeat standard tests
Pipeline maintenance required: ERP interfaces change, industry benchmarks become stale
False-confidence risk: blindly following the anomaly list misses findings outside the statistical procedures

FAQ

Does AI QA replace audit planning?

No. The pipeline provides data-driven anomaly signals. Audit planning under PS 300 remains the auditor task: client-level risk assessment, understanding of internal control, choice of audit strategy. AI QA is input to risk assessment, not a substitute.

How do we defend the method before the audit oversight authority?

Three points: (1) The AI is pre-reviewer, not reviewer – the human decision per anomaly is logged. (2) The statistical procedures (Benford, balance comparison) have been recognised in audit literature for decades – AI only makes them efficient. (3) Art. 957a CO audit trail: model version, prompt, input data, and decision are kept for 10 years.

What is the cost per mandate?

For an SME mandate with 20,000 entries/year: about CHF 8 to 25 LLM token cost per run (Claude Sonnet, including RAG context). Plus one-time ERP integration and audit database setup (typically 4 to 12 hours of fiduciary work). Significantly cheaper than ACL or IDEA licences (from CHF 3,500/year).

Which ERP systems are supported?

Swiss standard: Abacus (REST API from version 2023), Bexio (REST API), Banana (XML export), SAP Business One (Service Layer API), Sage 50 (ODBC). For edge cases there is a CSV path: export the general ledger as CSV/Excel, the pipeline loads it. CSV adapter effort typically 2 to 4 hours.

Sources

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call