Relay.audit
Back to blog

The Relay blog

How a harness puts AI inside the engagement

/ Shailen Desai | CA(SA)

Imagine hiring a brilliant graduate, or even an experienced audit professional, but giving them no laptop, no login, no engagement file, no prior-year workpapers, and no way to update the workbook. They could be the smartest person in the room and still unable to touch the work. They can talk about the audit; but they cannot act inside it.

That is roughly where most finance professionals are with AI today. Tools like ChatGPT and Claude are capable, but they sit on the outside of the work — answering questions in isolation, unable to open the file and do anything in it. A harness is what brings the LLM inside, where it can act. That distinction — outside the work versus inside it — is what this piece is about, and it is the difference between an LLM that impresses auditors and one that changes how they work.

What is an AI harness?

Put simply: the LLM is the brain, and the harness is everything that lets it work: access to the file, the context of the engagement, the tools to take action, and the guardrails to do so safely.

There are many opinions on what makes up an AI harness, but for our purposes, we think of it through four simple parts: tools, skills, the loop, and guardrails.

Tools are the things that allow the model to act, instead of just reply. In an audit context, a tool might allow an agent to read Excel files, pull figures from a PDF, search client correspondence, query the firm’s methodology, compare two balances, or draft a workpaper.

Skills are repeatable workflows built from those tools. For example: preparing materiality, inspecting a sample, documenting a walkthrough, performing a planning risk assessment, or responding to a review note. Where a tool is a single action, a skill is the routine that strings several together, the way your methodology would.

The loop is the agent’s working cycle. It understands the task, inspects the available context, chooses a tool, takes an action, observes the result, decides what to do next, and repeats. This is what moves the system from simply answering a question to actually working through a problem. It’s the difference between a clever reply and actual work.

Guardrails are the controls around the agent: permissions, review requirements, audit trails, source citations, restricted actions, and human sign-off before anything is finalised. Without them, AI in audit is a risk. With them, it becomes a controlled assistant that can support the auditor without ever replacing professional judgement.

What would an audit harness look like?

Once you put an LLM inside a harness like the one we just described, you have what we would call an audit agent. An audit agent does not stand outside the engagement answering general questions. It works inside it. To do that, it needs to understand three things.

First, the firm’s audit environment: its methodology, documentation standards, review approach, templates, internal procedures, and expectations around evidence and sign-off.

Second, the specific client engagement: the prior-year file, the current-year trial balance, the financial statements, materiality, planning decisions, key risks, client correspondence, uploaded support, and review notes.

Third — and this is where it earns its place — it needs to actually do the work: complete forms, select samples, draft workpapers and perform a first-level review with review notes.

Take a simple example. Ask most chatbots "what is the audit risk in revenue?" and you get a textbook paragraph that could apply to any company in the world. Such an agent should do something different: pull this client's revenue figure from the trial balance, compare it to the prior year and to the figure in the draft financial statements, find last year's revenue workpaper and the procedures used, check the uploaded support against the balance, flag the inconsistencies, and draft a workpaper that is ready for a human to review.

Think of it like a supercharged audit supervisor: not replacing the auditor’s judgement, but organising the context, drafting the documentation, suggesting procedures, and doing the first-level checks across the engagement.

Coding agents show what is possible

We are already watching this happen in software development.

Tools like Claude Code, Cursor, GitHub Copilot, and Codex-style agents are useful because they are not just chatbots sitting off to the side. They work inside the developer’s workflow. They can read a codebase, understand its structure, propose changes, edit files, run tests, and iterate. The LLM becomes useful the moment it moves inside the workflow.

The evidence is real.

In a controlled study, GitHub found that developers using Copilot completed a coding task 55.8% faster than developers without it — one task in a lab rather than a blanket claim, but a striking one.1

Google reports that more than a quarter of its new code is generated by AI and then reviewed and accepted by its engineers.2

Anthropic, the maker of Claude, has said that 70-90% of its own code is written by AI.3

None of this needed a smarter LLM — only a way to bring it inside the work.

That has not made developers redundant. It has changed where their time goes:

  • less boilerplate
  • more design
  • more review
  • more architecture
  • more judgement

This is the lesson we are taking into audit. The same shift can happen in finance and assurance, but only if it is applied carefully. Audit carries higher requirements around evidence, review, documentation, confidentiality, and professional scepticism. That is exactly why the harness matters even more here.

Where audit goes next

This is what we are building at Relay. We are building it in public because the profession should grow with this technology, not simply watch it arrive.

The future of audit will not be shaped by LLMs alone. It will be shaped by how well firms put them to work — inside their methodology, their documentation, their review process. An audit agent that works inside the engagement.

References

Footnotes

  1. Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer, "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot", arXiv, 2023.

  2. Entrepreneur, "AI is taking over coding at Microsoft, Google and Meta".

  3. Fortune, "100% of code at Anthropic and OpenAI is now AI-written, Boris Cherny says", January 2026.