AI agents are getting more capable every month. They can write code, manage files, deploy infrastructure, and interact with APIs on your behalf. But with that capability comes a question most teams aren’t asking yet: what happens when an agent does something you didn’t intend?

The three levels of AI agent safety

Not all safety mechanisms are created equal. There’s a hierarchy, and understanding it is the difference between a guardrail that works and one that gives you false confidence.

Level 1: Prompt rules

The most common approach is to tell the agent what not to do. “Don’t delete files outside the project directory.” “Never run destructive database commands.” These rules live in the system prompt or in a configuration file the agent reads at startup.

The problem? Agents eventually ignore prompt rules. Through context window pressure, conflicting instructions, or simple hallucination, an agent can — and will — act outside the bounds you’ve set in natural language. Prompt rules are suggestions, not constraints.

Level 2: Voluntary gates

A step up from prompt rules: the agent framework includes a confirmation step. “Are you sure you want to delete this file?” The agent is expected to pause and check with the user before proceeding.

This is better, but it has a fundamental flaw. The gate is voluntary. The agent’s code calls the confirmation function, which means the agent can also skip it. A sufficiently complex chain of tool calls, an unexpected code path, or a poorly written plugin can bypass the gate entirely. The agent is both the actor and the gatekeeper — a conflict of interest.

Level 3: Infrastructure enforcement

This is where Agent Vigil operates. At Level 3, the safety mechanism exists outside the agent’s control. The agent’s action is physically blocked — the HTTP request hangs, the process waits — until a human reviews and approves it.

The agent cannot bypass this gate because it doesn’t control it. The enforcement happens at the infrastructure layer: a pre-action hook sends the request to an external service, and the hook doesn’t return until the human decides. Approve or deny. There is no third option, and there is no way around it.

Why this matters now

As agents take on more autonomous workflows — running CI pipelines, managing cloud resources, editing production databases — the cost of a wrong action goes up dramatically. A misunderstood instruction that deletes a production table isn’t a “prompt engineering problem.” It’s a data loss incident.

Infrastructure enforcement is the same principle that makes firewalls, permission systems, and code review gates effective. The actor doesn’t get to decide whether the safety check applies. The system enforces it unconditionally.

The default-deny contract

Agent Vigil takes this one step further with a default-deny posture. If the human doesn’t respond within five minutes, the action is denied. No response doesn’t mean “proceed” — it means “stop.”

This is the opposite of how most agent frameworks work, where silence means consent. With default-deny, the safest state is the default state. You have to actively choose to let the agent proceed.

Start thinking in layers

If you’re building with AI agents today, ask yourself: which level of safety are you relying on? If the answer is Level 1 or Level 2, consider what happens when those layers fail. Infrastructure enforcement isn’t paranoia — it’s engineering discipline applied to a new class of tool.