The two planes
Inference plane
Listens on
RELAY_PORT. Stateless and hot-path-optimized. Serves
/v1/* and /healthz, authenticates with relay keys, and routes each
request to a healthy upstream. No vendor branching — dispatch is uniform.Control plane
Listens on
RELAY_CONTROL_PORT. Serves the admin UI, /auth/*, CRUD for
every catalog kind, and operational endpoints. Authenticates with a
session cookie or admin bearer.The snapshot is the spine
Postgres is the source of truth, but it’s never read on the request path. Instead, every pod holds an immutable in-memory snapshot of the catalog.- A control-plane write lands in Postgres.
- The write fires a
NOTIFYon a Postgres channel. - Every pod’s listener receives it, rebuilds the snapshot with a copy-on-write reconciler, and atomically swaps it in — debounced to ~1 second.
A request, end to end
Route
The key’s policy confirms it grants the requested model, then the
model resolves to a host binding — which carries the wire adapter
(
openai or anthropic) and the upstream model name.Reserve and draw a key
Rate-limit budget is reserved (one Redis Lua call is the goal), and a
healthy host key is drawn from the pool. Tripped circuit breakers are
skipped; failover happens here, before any bytes flow.
Forward and stream
Relay forwards to the upstream and streams the response straight back. If
the inbound and upstream shapes match, bytes pass through verbatim;
otherwise each chunk is translated through Relay’s canonical protocol.
Relay does not fail over mid-stream. All failover across keys and hosts
happens before the first byte reaches the caller. Once bytes flow, an
upstream error is surfaced as-is.
Why it’s fast
- No Postgres on the hot path — routing reads the in-memory snapshot only.
- One Redis round-trip — rate-limit reservation is a single Lua call, not three trips.
- Async everything off-path — usage, traces, and payload capture emit on bounded channels with drop-on-full, never blocking the response.
- Byte-for-byte passthrough — when shapes match, Relay copies bytes rather than parsing and re-serializing.
Where state lives
| Store | Holds | On the request path? |
|---|---|---|
| Postgres | Catalog truth: hosts, models, keys, policies | No — only via the snapshot |
| In-memory snapshot | Read-optimized copy of the catalog | Yes — every routing decision |
| Redis / Valkey | Rate-limit counters, per-key circuit breakers | Yes — one Lua call per request |