Operations 2026-01-31

Zero-Downtime Hot Reload

How iddio swaps policy and token configuration without dropping a single request. File watching with fsnotify, 500ms debounce, RWMutex-protected atomic swaps, and last-known-good fallback.

The Problem: Config Changes Shouldn’t Drop Traffic

Every proxy that reads config from a file has the same problem: what happens when the file changes? The naive answer — restart the proxy — means dropped connections, interrupted sessions, and unhappy agents. For a security-critical system that proxies live Kubernetes traffic, a restart window is unacceptable.

Iddio watches policy.yaml and tokens.yaml for changes and swaps them in-place without dropping a single in-flight request.

File Watching with fsnotify

Iddio uses fsnotify to watch both config files. When a write event fires, the watcher kicks off the reload pipeline:

func (p *Proxy) watchConfigFiles(ctx context.Context) {
    watcher, _ := fsnotify.NewWatcher()
    defer watcher.Close()

    watcher.Add(p.policyPath)
    watcher.Add(p.tokensPath)

    var debounce *time.Timer

    for {
        select {
        case <-ctx.Done():
            return
        case event := <-watcher.Events:
            if event.Op&(fsnotify.Write|fsnotify.Create) == 0 {
                continue
            }
            // Debounce: editors often write multiple events
            if debounce != nil {
                debounce.Stop()
            }
            debounce = time.AfterFunc(500*time.Millisecond, func() {
                p.reloadConfig(event.Name)
            })
        }
    }
}

The 500ms Debounce

Text editors don’t write files atomically. Vim, for example, writes to a temp file, renames the original, then renames the temp file — generating multiple fsnotify events for a single save. Without debouncing, the proxy would attempt to reload after the first event, potentially reading a partially-written file.

The 500ms debounce window collapses all events within a half-second into a single reload attempt. This covers the write patterns of every major editor (Vim, VS Code, nano, sed, etc.) while keeping the reload latency perceptibly instant.

RWMutex-Protected Atomic Swaps

The core of hot reload is the SwapPolicy and SwapAuth methods. These use sync.RWMutex to swap the active configuration without blocking in-flight requests:

func (p *Proxy) SwapPolicy(newPolicy *Policy) {
    p.mu.Lock()
    defer p.mu.Unlock()
    p.policy = newPolicy
}

func (p *Proxy) SwapAuth(newAuth Authenticator) {
    p.mu.Lock()
    defer p.mu.Unlock()
    p.authenticator = newAuth
}

Request handlers acquire a read lock:

func (p *Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    p.mu.RLock()
    policy := p.policy
    auth := p.authenticator
    p.mu.RUnlock()

    // Use local copies — no lock held during request processing
    agent, err := auth.Authenticate(r)
    // ...
    decision := policy.Evaluate(agent, tier, namespace)
    // ...
}

The key insight: the read lock is held only long enough to copy the policy and authenticator references. The actual request processing happens without any lock, so hot reload never blocks in-flight requests.

Last-Known-Good Fallback

If the new config file is malformed (invalid YAML, missing required fields, schema violations), the proxy logs the error and keeps the previous configuration:

func (p *Proxy) reloadConfig(path string) {
    newPolicy, err := LoadPolicy(path)
    if err != nil {
        log.Printf("config reload failed for %s: %v (keeping previous config)", path, err)
        return
    }

    if err := newPolicy.Validate(); err != nil {
        log.Printf("config validation failed for %s: %v (keeping previous config)", path, err)
        return
    }

    p.SwapPolicy(newPolicy)
    log.Printf("config reloaded: %s", path)
}

This means a typo in policy.yaml never takes down the proxy. The old policy remains active until a valid replacement is saved.

What Gets Hot-Reloaded

File	What Changes	Reload Behavior
`policy.yaml`	Agent rules, tier mappings, runbooks, namespace scopes	Atomic swap via `SwapPolicy()`
`tokens.yaml`	Bearer token list	Atomic swap via `SwapAuth()`

What does NOT hot-reload:

TLS certificates — changing the CA or server cert requires a restart (this is intentional: cert changes are rare and security-sensitive)
Cluster URL — changing the upstream cluster requires a restart
Listen address — changing the proxy’s bind address requires a restart

These are startup-time configurations that rarely change. Making them hot-reloadable would add complexity with minimal benefit.

Observability

Every reload event is logged with the file path and outcome:

2026-01-31T10:15:22Z config reloaded: /home/user/.iddio/policy.yaml
2026-01-31T10:15:25Z config reload failed for /home/user/.iddio/tokens.yaml: yaml: line 5: did not find expected key (keeping previous config)
2026-01-31T10:15:30Z config reloaded: /home/user/.iddio/tokens.yaml

For enterprise deployments using the managed control plane, policy reloads are also recorded as audit events, so you can track when policy changed and what the previous version was.

Try It Yourself

Iddio is open source. Deploy a zero-trust command proxy for your AI agents in minutes.

View on GitHub Read the Docs