Most AI governance programmes start with a rule. Often it is the wrong rule, applied uniformly, and enforced by a team that lacks the authority to change it when context makes it harmful. An AI contextual governance framework replaces that uniform rule with a situational logic: the level of human oversight required is determined by the nature of the decision, not by the category of the system making it.
Key takeaways
- Contextual governance: An AI contextual governance framework classifies decisions by risk tier and assigns proportionate oversight to each tier, rather than applying a single policy to every AI output.
- Three primary focuses: AI governance frameworks converge on three primary focuses — accountability (who owns each decision), transparency (whether the reasoning is auditable), and control (whether humans can intervene before harm occurs).
- Why rules fail: Rule-based governance fails because uniform constraints either over-burden low-risk uses or leave high-risk decisions under-supervised, depending on how the rule was calibrated.
- Tier assignment is situational: In a contextual model, the same AI system can operate under different oversight requirements depending on who receives the output and what action it triggers.
- HITL is the mechanism, not the framework: Human-in-the-loop is how contextual governance is enforced at each tier; the framework itself determines when HITL is required and what authority the reviewer holds.
Why Rule-Based AI Governance Breaks Down
The appeal of rule-based governance is clarity. Write the rule once, apply it everywhere, audit compliance against it. For a decade, that approach worked reasonably well because AI systems were narrow: a fraud-detection model made one type of decision, a recommendation engine made another, and a single policy per system was defensible.
General-purpose language models broke that assumption. A single model can now summarise a contract, draft a regulatory filing, respond to a customer complaint, and generate internal code — sometimes in the same session. A rule calibrated for the highest-stakes use case imposes unnecessary friction on every lower-stakes use. A rule calibrated for the average use case leaves the high-stakes outputs unprotected. Neither is governance; both are the appearance of governance.
The deeper problem is that rules are written at a point in time, for anticipated situations. AI systems surface unanticipated situations continuously. A rule that says "all customer-facing outputs require human review" does not tell the reviewer what to look for, what authority they have, or what happens when the output is time-sensitive. The rule creates the checkbox; it does not create governance.
What an AI Contextual Governance Framework Is
An AI contextual governance framework is a model for overseeing AI systems that determines the level of human involvement required based on the specific decision context, not the system type. The framework has three components: a context classifier, a tier assignment mechanism, and a set of per-tier oversight requirements.
The context classifier evaluates each AI output against a set of risk dimensions before it is acted upon. The dimensions typically include the sensitivity of the underlying data, the reversibility of the action the output would trigger, the vulnerability of the affected audience, and the confidence of the model. The classifier assigns a risk score, which maps to a tier. The tier determines the oversight requirement.
This is not a new concept. NIST AI RMF 1.0 describes "context-specific risk" throughout its Manage function, emphasising that risk management actions should be "commensurate with the risk" rather than applied uniformly. ISO/IEC 42001:2023 similarly requires that AI management systems define criteria for human oversight proportionate to the impact of the AI system. What contextual governance adds is an operational mechanism: a tier structure that translates risk scores into specific oversight requirements at inference time.
NIST AI RMF 1.0, MANAGE 2.2: "Mechanisms are in place and applied to sustain the value of deployed AI systems and minimise the negative impact of AI systems over time." The risk-commensurate principle runs through the entire Manage function.The Three Primary Focuses of AI Governance Frameworks
Across NIST AI RMF, ISO/IEC 42001, and the EU AI Act, three primary focuses appear consistently: accountability, transparency, and control. These are not aspirations; they are design requirements.
Accountability asks who is responsible for each decision an AI system makes. In rule-based governance, accountability is often diffuse: the model produced the output, the team deployed the model, the vendor built the model, and no single person is accountable for the specific decision that caused harm. Contextual governance requires that accountability be assigned at the tier level: for a Tier 3 decision, the reviewer is accountable, and their identity is logged.
Transparency asks whether the reasoning behind a decision is auditable. This does not mean the model must be interpretable in a machine-learning sense. It means there must be a record of what input the model received, what output it produced, which tier it was assigned to, who reviewed it, and what the reviewer decided. Transparency is a record-keeping requirement, not an explainability requirement.
Control asks whether humans can intervene before harm occurs, not merely investigate after it. This is the most commonly missed focus. A system that logs everything but cannot be stopped in time provides accountability without control. Contextual governance requires that each tier have defined intervention mechanisms: for Tier 1, automated alerts; for Tier 2, a gate that holds the output until reviewed; for Tier 3, a sequential approval that cannot be bypassed under time pressure.
The Three-Tier Oversight Model
The three-tier model is the operational expression of the contextual governance framework. Tier assignment is determined at inference time by the context classifier. The same AI system can operate at different tiers depending on who receives the output and what action it triggers.
Tier 1 covers outputs where the risk score falls below the first threshold. The requirement is automated monitoring and periodic human audit. No gate is applied to individual outputs. The monitoring catches distribution drift and volume anomalies that would otherwise go undetected.
Tier 2 applies where the risk score indicates that some outputs require a gate before action is taken. The context classifier identifies those outputs and routes them to a reviewer. The reviewer has specific criteria, defined authority (approve, modify, or reject), and a time-bounded review SLA. Outputs that are not reviewed within the SLA are held, not auto-approved.
Tier 3 applies to the highest-risk outputs: irreversible decisions, decisions affecting vulnerable populations, or decisions where the model's confidence is below a defined threshold. Every consequential output at Tier 3 requires sequential approval. In regulated contexts, dual sign-off is required: two qualified reviewers must approve independently.
Direct answer
Contextual Oversight in Practice
Consider a credit-risk AI operating inside a bank. It produces three categories of output: portfolio-level summaries consumed by internal analysts, individual loan assessments reviewed by lending officers, and final credit decisions that are transmitted to customers. Under rule-based governance, all three might be subject to the same quarterly model-risk review, which tells the governance team almost nothing about the quality of individual decisions.
Under a contextual governance framework, the context classifier assigns each output to a tier based on its decision type. Portfolio summaries are Tier 1: automated monitoring, no individual gate. Loan assessments are Tier 2: a lending officer reviews each one against defined criteria before it advances to a decision stage. Final credit decisions are Tier 3: a supervisor reviews the lending officer's recommendation, the model output, and the supporting evidence before approval.
The bank's governance team can now answer the three primary focus questions for any specific decision. Who was accountable? The lending officer and the supervisor, both identified by name in the log. Was the decision transparent? Yes: the input data, model output, officer's review notes, and supervisor's approval are all in the record. Was control exercised? Yes: the output was held at the Tier 2 gate until reviewed, and the Tier 3 approval was sequential, not concurrent.
For teams beginning this work, the AI Readiness Assessment covers the governance dimension in detail, including a scored maturity model for HITL checkpoints and refusal conditions. It is a useful starting point for organisations that have informal governance practices and need to make them explicit before implementing a tier structure.
The Connection to Refusal and HITL
Contextual governance does not eliminate the need for refusal conditions; it requires that refusal conditions be tier-specific. A Tier 1 system may have no individual output-level refusal conditions: it produces outputs and they are monitored in aggregate. A Tier 3 system should have between five and ten enumerated refusal conditions, each written as a rule the system can evaluate at inference time.
Human-in-the-loop is not a separate programme running alongside contextual governance. It is the mechanism through which Tier 2 and Tier 3 oversight requirements are enforced. The governance framework determines when HITL is required; the HITL design determines how it works. An organisation that has a HITL process but no tier structure is applying HITL heuristically, which means inconsistently. An organisation with a tier structure but no HITL mechanism at Tier 3 has a governance framework on paper and a compliance gap in practice.
The relationship between refusal and HITL in contextual governance is direct: refusal is what the system does when it cannot produce an output that meets the tier's quality threshold. HITL is what happens when the system produces an output but the tier requires human judgment before that output is acted upon. Neither substitutes for the other. Together, they constitute the control surface that contextual governance requires.
For a deeper treatment of refusal as a design feature rather than a failure mode, see Why AI Refusal Matters. For the HITL design principles that underpin Tier 2 and Tier 3 oversight, see Human-in-the-Loop AI. For the risk scoring approach that feeds the context classifier, the AI Risk Management Framework article covers the NIST-aligned methodology in detail.