Zero-Downtime Hot Reload
How iddio swaps policy and token configuration without dropping a single request. File watching with fsnotify, 500ms debounce, RWMutex-protected atomic swaps, and last-known-good fallback.
The Problem: Config Changes Shouldn’t Drop Traffic
Every proxy that reads config from a file has the same problem: what happens when the file changes? The naive answer — restart the proxy — means dropped connections, interrupted sessions, and unhappy agents. For a security-critical system that proxies live Kubernetes traffic, a restart window is unacceptable.
Iddio watches policy.yaml and tokens.yaml for changes and swaps them in-place without dropping a single in-flight request.
File Watching with fsnotify
Iddio uses fsnotify to watch both config files. When a write event fires, the watcher kicks off the reload pipeline:
func (p *Proxy) watchConfigFiles(ctx context.Context) {
watcher, _ := fsnotify.NewWatcher()
defer watcher.Close()
watcher.Add(p.policyPath)
watcher.Add(p.tokensPath)
var debounce *time.Timer
for {
select {
case <-ctx.Done():
return
case event := <-watcher.Events:
if event.Op&(fsnotify.Write|fsnotify.Create) == 0 {
continue
}
// Debounce: editors often write multiple events
if debounce != nil {
debounce.Stop()
}
debounce = time.AfterFunc(500*time.Millisecond, func() {
p.reloadConfig(event.Name)
})
}
}
}
The 500ms Debounce
Text editors don’t write files atomically. Vim, for example, writes to a temp file, renames the original, then renames the temp file — generating multiple fsnotify events for a single save. Without debouncing, the proxy would attempt to reload after the first event, potentially reading a partially-written file.
The 500ms debounce window collapses all events within a half-second into a single reload attempt. This covers the write patterns of every major editor (Vim, VS Code, nano, sed, etc.) while keeping the reload latency perceptibly instant.
RWMutex-Protected Atomic Swaps
The core of hot reload is the SwapPolicy and SwapAuth methods. These use sync.RWMutex to swap the active configuration without blocking in-flight requests:
func (p *Proxy) SwapPolicy(newPolicy *Policy) {
p.mu.Lock()
defer p.mu.Unlock()
p.policy = newPolicy
}
func (p *Proxy) SwapAuth(newAuth Authenticator) {
p.mu.Lock()
defer p.mu.Unlock()
p.authenticator = newAuth
}
Request handlers acquire a read lock:
func (p *Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
p.mu.RLock()
policy := p.policy
auth := p.authenticator
p.mu.RUnlock()
// Use local copies — no lock held during request processing
agent, err := auth.Authenticate(r)
// ...
decision := policy.Evaluate(agent, tier, namespace)
// ...
}
The key insight: the read lock is held only long enough to copy the policy and authenticator references. The actual request processing happens without any lock, so hot reload never blocks in-flight requests.
Last-Known-Good Fallback
If the new config file is malformed (invalid YAML, missing required fields, schema violations), the proxy logs the error and keeps the previous configuration:
func (p *Proxy) reloadConfig(path string) {
newPolicy, err := LoadPolicy(path)
if err != nil {
log.Printf("config reload failed for %s: %v (keeping previous config)", path, err)
return
}
if err := newPolicy.Validate(); err != nil {
log.Printf("config validation failed for %s: %v (keeping previous config)", path, err)
return
}
p.SwapPolicy(newPolicy)
log.Printf("config reloaded: %s", path)
}
This means a typo in policy.yaml never takes down the proxy. The old policy remains active until a valid replacement is saved.
What Gets Hot-Reloaded
| File | What Changes | Reload Behavior |
|---|---|---|
policy.yaml | Agent rules, tier mappings, runbooks, namespace scopes | Atomic swap via SwapPolicy() |
tokens.yaml | Bearer token list | Atomic swap via SwapAuth() |
What does NOT hot-reload:
- TLS certificates — changing the CA or server cert requires a restart (this is intentional: cert changes are rare and security-sensitive)
- Cluster URL — changing the upstream cluster requires a restart
- Listen address — changing the proxy’s bind address requires a restart
These are startup-time configurations that rarely change. Making them hot-reloadable would add complexity with minimal benefit.
Observability
Every reload event is logged with the file path and outcome:
2026-01-31T10:15:22Z config reloaded: /home/user/.iddio/policy.yaml
2026-01-31T10:15:25Z config reload failed for /home/user/.iddio/tokens.yaml: yaml: line 5: did not find expected key (keeping previous config)
2026-01-31T10:15:30Z config reloaded: /home/user/.iddio/tokens.yaml
For enterprise deployments using the managed control plane, policy reloads are also recorded as audit events, so you can track when policy changed and what the previous version was.
Try It Yourself
Iddio is open source. Deploy a zero-trust command proxy for your AI agents in minutes.