mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-18 11:06:19 +00:00
ManagedContainer.start was firing the subclass `readinessProbe()` exactly once, the moment containerd reported the container as Up. For OpenClaw this raced the Node.js gateway's HTTP listener bind — containerd flips status as soon as the entrypoint process spawns, but the Express server takes a few hundred ms to start serving /readyz. Single-shot probe → unlucky → state='errored' with "Readiness probe failed after container reached running state". Pre-refactor (dev branch) didn't hit this because openclaw used a two-phase flow: `runtime.startGateway` (no probe) then `service.waitForReady` (polled /readyz for 30s). When the new runtime architecture folded openclaw under ManagedContainer, the polling was lost. Bring it into the base class: `ManagedContainer.start` now polls `readinessProbe()` within `descriptor.readinessProbe.timeoutMs` at `intervalMs` cadence. Deterministic probes (Hermes' `--version` exec) succeed on the first call and exit immediately — no extra latency. HTTP probes get the full budget they need. Also stops misapplying `descriptor.readinessProbe` to the containerd "Up" wait (which only takes ~50ms anyway — defaults are fine).