Architecture 2026-01-25

The Runbook Engine

How iddio pre-approves common operations so agents can work without interruption. Runbook YAML schema, pattern matching, tier downgrade mechanics, and max_tier safety caps.

The Problem: Approval Fatigue

If every write operation requires a human to approve it, AI agents become unusable in practice. An agent debugging a production issue might need to restart a deployment, scale a replica set, and check rollout status — all within seconds. Requiring manual approval for each one creates a bottleneck that defeats the purpose of autonomous agents.

But automatically allowing all writes is obviously wrong. The solution is somewhere in between: pre-approve specific, well-understood operations, and escalate everything else.

That’s what runbooks are. They’re named patterns that match operations you’ve already decided are safe for agents to execute without asking.

Runbook YAML Schema

A runbook is a named set of operation patterns. Each pattern specifies HTTP methods, Kubernetes resource types, and optionally specific subresources or API groups.

# ~/.iddio/policy.yaml
runbooks:
  restart-deploy:
    operations:
      - methods: [PATCH]
        resources: [deployments]
        subresources: []

  scale-deploy:
    operations:
      - methods: [PATCH, PUT]
        resources: [deployments/scale]

  check-rollout:
    operations:
      - methods: [GET]
        resources: [deployments]
        subresources: [status]

Each operation entry is a conjunction: the request must match ALL specified fields. Multiple operations in a runbook are a disjunction: the request must match ANY one of them.

How Pattern Matching Works

When a request arrives that would normally be classified as T1 (OPERATE) or higher, the runbook engine checks whether it matches a runbook assigned to that agent in the current namespace scope.

The matching algorithm:

Method match — the HTTP method must appear in the runbook’s methods list
Resource match — the Kubernetes resource type must appear in resources. Supports resource/subresource syntax for compound resources.
Subresource match — if subresources is specified, the request’s subresource must match. If omitted, any subresource (or none) is accepted.

func (e *RunbookEngine) Matches(
    runbook Runbook,
    method, resource, subresource string,
) bool {
    for _, op := range runbook.Operations {
        if !slices.Contains(op.Methods, method) {
            continue
        }
        // Handle "resource/sub" compound syntax
        if strings.Contains(op.Resources[0], "/") {
            compound := resource + "/" + subresource
            if slices.Contains(op.Resources, compound) {
                return true
            }
            continue
        }
        if !slices.Contains(op.Resources, resource) {
            continue
        }
        if len(op.Subresources) > 0 &&
           !slices.Contains(op.Subresources, subresource) {
            continue
        }
        return true
    }
    return false
}

Tier Downgrade Mechanics

When a runbook matches, it doesn’t bypass the tier system — it downgrades the effective tier. A request that the classifier assigned T2 (MODIFY) gets downgraded to T1 (OPERATE) if a matching runbook exists. The policy engine then evaluates the T1 rule instead of the T2 rule.

This means you can write policies like:

agents:
  claude-code:
    rules:
      - namespaces: ["payments"]
        runbooks: [restart-deploy, scale-deploy]
        tiers:
          0: allow # reads
          1: allow # runbook-matched ops
          2: escalate # non-runbook writes
          3: escalate # sensitive
          4: deny # break-glass

With this policy, kubectl rollout restart in the payments namespace matches the restart-deploy runbook, gets downgraded to T1, and is auto-allowed. The same command in a namespace not covered by any rule still gets classified as T2 and requires approval.

The max_tier Safety Cap

Runbooks have a safety mechanism: max_tier. This prevents a runbook from downgrading operations that are above a certain risk level.

runbooks:
  restart-deploy:
    max_tier: 2
    operations:
      - methods: [PATCH]
        resources: [deployments]

With max_tier: 2, this runbook can downgrade T2 operations to T1, but it won’t affect T3 or T4 operations. If someone somehow crafts a deployment patch that the classifier rates as T3 (sensitive), the runbook won’t save it — it still requires escalation.

The default max_tier is 2. Most runbooks should leave this at the default. Setting it higher is possible but should be done carefully and for specific, well-understood operations.

Audit Trail

Runbook-matched operations include extra fields in the audit log:

{
  "timestamp": "2026-01-25T14:30:22Z",
  "agent": "claude-code",
  "method": "PATCH",
  "resource": "deployments",
  "namespace": "payments",
  "tier": 1,
  "original_tier": 2,
  "runbook": "restart-deploy",
  "decision": "allow",
  "latency_us": 180
}

The original_tier and runbook fields make it clear that this operation was downgraded by a runbook, not natively classified as T1. This is important for compliance: auditors can see exactly which operations were pre-approved and which required human review.

Design Rationale

The runbook system is deliberately simple. It doesn’t support:

Conditional logic — no “allow if the deployment has fewer than 3 replicas”
Chained operations — no “allow restart only after a health check”
Time windows — no “allow during business hours only”

These features add complexity that makes runbooks harder to audit. A security-conscious operator should be able to read a runbook definition and immediately understand what it permits. The current schema — methods, resources, subresources — is the minimum surface area that covers the most common pre-approval patterns.

For complex conditional policies, use OPA/Rego integration (available with the opa build tag). Runbooks are for the simple cases that cover 80% of agent operations.

Try It Yourself

Iddio is open source. Deploy a zero-trust command proxy for your AI agents in minutes.

View on GitHub Read the Docs