An AI audit is the structured review that produces the evidence package an auditor, a regulator, or an injured party can replay. It is not a generic IT assessment with the word "AI" attached. The discipline borrows from financial and information-security audit, but the artefacts and the cadence are specific to systems that make commitments. This guide gives a working definition, the four artefacts every audit has to produce, the three audit modes and where each fits, and how the practice maps onto NIST AI RMF and ISO/IEC 42001.
Key takeaways
- **An AI audit is an evidence exercise, not a checklist:** The output is a package an external reviewer can replay, not a self-rated maturity score.
- **Four artefacts are non-negotiable:** Decision log, model card, risk register, and refusal log. A package missing any of the four cannot be defended.
- **Three audit modes serve different goals:** Internal audits surface drift, external audits produce regulator-facing assurance, continuous audits catch failures that point-in-time reviews miss.
- **Standards already shape the practice:** NIST AI RMF defines the Manage function as the audit-adjacent activity; ISO/IEC 42001 is the management-system standard that names audit obligations directly.
- **A refusal is auditable evidence:** A refusal logged with full provenance is, in audit terms, equivalent to a commitment logged with full provenance. Silent refusals fail the audit.
A working definition
The practice is a structured review of an AI system's design, deployment, and operating record, against a named standard, that produces evidence an independent reviewer can verify. Three clauses inside that sentence do the load-bearing work. First, against a named standard: an audit without a referenced standard (NIST AI RMF, ISO/IEC 42001, an internal policy, or a specific regulation) collapses into opinion. Second, operating record: the audit covers what the system actually did, not only what it was designed to do. Third, independent reviewer: the audit is only as credible as the reviewer's ability to verify the evidence without relying on the build team's narrative.
The U.S. NIST AI Risk Management Framework does not use the word "audit" as a top-level function, but its Manage function carries the audit-adjacent obligations: documented evidence of how identified risks were handled, by whom, and with what outcome. In the broader system view, this practice sits inside the AI risk management framework as the verification step that closes the Manage loop.
The four artefacts every audit must produce
A reviewer who cannot find these four artefacts will, correctly, refuse to sign off. The artefacts are not optional and they are not interchangeable.
Decision log. The replayable record of commitments the system made, including the input, the model output, the rule that applied, the actor identity, and the timestamp. A decision log that cannot reconstruct a specific past commitment is not a decision log. This is the closest analogue to a financial audit's general ledger.
Model card. The provenance and operating envelope of every model in production: training data summary, evaluation results, intended use, known limitations, version history. The cards are versioned and tied to the decision-log entries by model version, so a reviewer can ask "which model committed this decision" and get an answer.
Risk register. The identified risks, their current treatment, the residual risk, the owner, and the next review date. The register is the working surface where new findings, including red-team findings, are tracked into remediation. ISO/IEC 42001:2023, the AI management-system standard published by ISO, makes a risk register an explicit requirement of the management system.
Refusal log. Every refusal the system produced, logged with the same evidence rigour as a commitment. The refusal log is the artefact most often missing from internal audits, because the build team has often treated refusals as silent failures rather than first-class outcomes. A refusal you cannot replay is, in audit terms, no different from a commitment you cannot replay.
Internal, external, and continuous
The same word covers three audit modes that serve different purposes. Confusing them produces a calendar with one audit and three unmet obligations.
Internal audit. Run by a function inside the organisation that does not report to the build team. Its purpose is drift detection. The internal team knows the system well enough to ask precise questions and is paid to find problems before an external reviewer does. The cadence is typically quarterly for high-stakes systems.
External audit. Run by a third party, often required by regulation, contract, or board directive. Its purpose is independent assurance the organisation can show to outside parties. External audits are slower and more expensive, and they produce the evidence package that becomes load-bearing in regulator or court conversations.
Continuous audit. Automated checks running over the production stream, surfacing anomalies in real time. Continuous monitoring catches drift between point-in-time reviews, and it is the only mode that catches certain failure classes (model drift, data distribution shift, slow degradation) before they cause visible incidents.
A mature programme runs all three. The internal team produces evidence packages on a known cadence. The external review consumes those packages and adds independent testing. The continuous layer raises a flag when the production reality starts to drift from the picture the point-in-time reviews captured.
How standards map to the practice
Two standards do most of the work in 2026. NIST AI RMF is voluntary and framework-shaped; ISO/IEC 42001 is certifiable and management-system shaped. They are complementary, not competing.
NIST AI RMF puts the audit-adjacent activities inside its four functions: Govern (who is accountable), Map (what does the system do and not do), Measure (how do we test it), and Manage (how do we treat identified risks and evidence the treatment). The framework does not name "audit" because it does not impose compliance; it specifies the artefacts an audit can verify.
ISO/IEC 42001 takes the management-system shape that audit teams already know from ISO 27001 and ISO 9001. It names internal audits, management reviews, corrective actions, and a documented risk treatment process as obligations of the management system. An organisation that has certified to ISO/IEC 42001 has a management system designed to be auditable; the certification itself is, by definition, the output of an external audit.
The EU AI Act adds a third layer for organisations inside that jurisdiction. Article 17 of the Act introduces quality-management-system obligations for high-risk AI providers that overlap substantially with ISO/IEC 42001's structure. Implementation regulation continues to evolve, and the organisations affected typically build to the more specific of the two when a conflict arises.
Cube A Cloud treats the Audit phase of its engagement work as the formal gate where these standards become operationally binding inside a deployment. The four artefacts above are the deliverables that phase produces.
Where the practice goes wrong
Three failure modes recur. The first is auditing the model and skipping the system around it. A model card alone does not survive a real audit, because the commitment is made by the system, not the model. The second is treating the refusal log as optional. Refusals are evidence of the system operating inside its scope; their absence is evidence of either a system that never refuses (suspicious) or a system that refuses silently (worse). The third is running an annual external review and calling that a programme. Annual cadence catches none of the failure classes that need continuous monitoring, and it produces an evidence package that is already stale on the day of issue.
The version we build into what a decision system actually is treats the four artefacts as system outputs, not as a deliverable the audit team has to chase the build team for. That distinction is the difference between an audit programme that scales and one that does not.
A practical closing
An audit programme is worth its cost only when the artefacts it produces survive external scrutiny. Organisations that get this right design the system to produce the four artefacts as a side-effect of normal operation, rather than as a quarterly scramble for the auditor. The version we implement runs through the Discover, Screen, Audit, Deploy, and Monitor phases of Cube A Cloud's engagement protocol, with the artefacts owned by the system rather than the audit team.