← All work01/052026
Product Designer

Sentinel

Keep a human in the loop — at the scale agents now run.

Sentinel
TL;DRThe 5-second version

Keep a human in the loop — at the scale agents now run.

Product Designer/Self-initiated concept/2026
Key moves
  • 01Insert the human at the riskiest moments — not all of them
  • 02Per-agent pause and throttle, not one global stop button
  • 03Show confidence — and its limits — at the point of decision
Outcome

40 screens on one design system · 3 end-to-end flows · designed against EU AI Act Art. 14

Role
Product Designer
Type
Self-initiated concept
Skills
Product Design · Design Systems · Enterprise UX · AI / Agent UX
Tools
Figma · Claude + Figma MCP
00In short

Banks are handing underwriting, collections and KYC to AI agents — but from August 2026 the EU AI Act makes a human overseer legally mandatory for high-risk credit decisions. Sentinel is an oversight console where one operator supervises a fleet of financial agents: catch a drifting agent, approve or reject a high-risk action before it executes, and reconstruct any past decision for an auditor. A self-initiated concept, designed end-to-end on a real design system.

01
Art. 14 by design
02
40 screens · one system
03
Every action auditable
01The problem

High-stakes finance is rapidly delegating decisions — who gets a loan, whose account is frozen for an AML hit, which borrower gets a collections message — to AI agents that act faster than any human can watch. The catch: these are exactly the decisions regulators refuse to leave unattended. From 2 August 2026 the EU AI Act treats credit scoring as high-risk and requires (Art. 14) a human who can understand the system, override it, and halt it. The cost of getting it wrong is already concrete: in July 2025 the Massachusetts AG settled with Earnest Operations for $2.5M over an AI lending model that produced disparate impact through a proxy variable. So the real design problem isn't building the agents — it's building the cockpit that lets one person meaningfully supervise many of them, intervene in seconds, and prove afterwards that a human was genuinely in control.

+What the research said
01

Accuracy isn't trust

A model can be 99% accurate and still need a human who can say no — the 1% in high-stakes finance is someone's loan, account, or livelihood.

02

Rubber-stamping fails the test

Article 14 isn't satisfied by a human clicking approve. Oversight only counts if the person can actually see why — and is given a real reason to look.

03

The loop is now mandatory

From August 2026, human oversight and auditability for high-risk credit AI are legal requirements, not product nice-to-haves. The market needs this cockpit whether it wants it or not.

Sentinel — overview
The core flow
01Agent pauses a high-risk action
02Operator sees the full context
03Approve, reject, or send back
04Decision lands in the audit log

The Article 14 moment — a human decides before anything executes.

02Key decisions
01

Insert the human at the riskiest moments — not all of them

Oversight only scales if you don't ask the operator to review everything. Sentinel gates on risk: an agent pauses and parks an action only when it crosses a human-gate threshold (loan above a cap, any adverse action, an AML escalation), and the review queue is risk-ranked so the most dangerous item is always on top. Tradeoff: a badly tuned threshold either floods the human (back to rubber-stamping) or lets a risky action slip through — so the thresholds themselves became a first-class, auditable setting, not a hidden constant.

02

Per-agent pause and throttle, not one global stop button

When an agent drifts, halting the entire fleet is its own incident. Sentinel lets the operator throttle or pause the single misbehaving agent while the rest keep running — containing the blast radius without paging engineering. Tradeoff: partial-halt states are confusing, so the fleet view had to make 'what's running vs. what's contained' unmistakable at a glance.

03

Show confidence — and its limits — at the point of decision

The AI Act explicitly warns against automation bias: humans nodding through whatever the machine suggests. So every decision surfaces the model's confidence, the policy checks it passed or failed, and a step-by-step reasoning trace — including a fairness check — before the operator commits. Tradeoff: more to read per decision, but a decision a human can't interrogate isn't oversight, it's theater.

04

Make the audit trail a feature, not an export

Defensibility is the compliance officer's whole job. Instead of reconstructing events from logs after a regulator asks, Sentinel records a tamper-evident timeline as work happens — every system, agent and human action in order, reconstructable exactly as it stood at decision time, and mapped to the specific regulations it satisfies. Tradeoff: more state and integrity machinery up front, in exchange for an answer that already exists when someone asks 'why did the agent do this?'

05

Build on a real design system, not one-off screens

An oversight tool lives or dies on consistency — the same status, read the same way, everywhere. I built primitives and semantic tokens, then domain components (agent card, decision inspector, audit timeline), and assembled every screen from them. Tradeoff: slower to first pixel, but the screens are trivially consistent and the system itself is evidence of how I think.

vsThe shift
Autonomous agents
  • Agents act; humans find out later
  • '99% accurate' — but nobody's accountable
  • Reasoning buried in logs
  • Audit means an export after the fact
Sentinel
  • High-risk actions pause for a human
  • Confidence and limits shown at decision time
  • The agent's reasoning trace, step by step
  • A tamper-evident record, built as you go
Selected screens
Sentinel — 2
Sentinel — 3
Sentinel — 4
Sentinel — 5
Sentinel — 6
Sentinel — 7
03Outcome

Sentinel is a concept, and I've framed it as one — no fabricated users, deployments, or shipped metrics. What it demonstrates is the harder thing a high-stakes product is judged on: a complete, coherent system — 40 screens spanning fleet monitoring and analytics, the human-in-the-loop review gate, reasoning-replay and fairness investigation, a tamper-evident audit trail, policy governance with versioned diffs, agent configuration and deployment, the full shell (sign-in, settings, command palette, empty/loading/error states) and a responsive mobile set — all assembled from one design system, each surface designed against a named regulatory requirement. The honest success criterion is an expert review: would a risk officer accept that this plausibly supports pre-authorization, intervention, and a contestable, auditable rationale? That's the bar I designed to.

Art. 14 by design 40 screens · one system Every action auditable
04What I learned
Next project
Plumb