Why it exists
A single provider API key has a fixed rate limit and a single point of failure. The moment you run real traffic you end up writing the same glue every time: rotate between keys, retry on429, fall back when one
provider degrades, keep per-model limits straight. Relay is that glue,
extracted into a service and made operable.
Higher effective throughput
Pool many provider keys behind one relay key. Limits add up instead of
capping you at a single key’s ceiling.
Failover by default
Per-key circuit breakers route around dead or throttled keys without
your app noticing.
One wire shape
OpenAI- and Anthropic-compatible endpoints. Keep your existing SDK;
just change the base URL.
Operable
An admin UI and Control API for hosts, keys, and policies — not a config
file you redeploy to change.
The mental model
A handful of catalog nouns carry the whole system. Once these click, the reference pages read straight through.Hosts
Hosts
The upstream endpoints Relay routes to — a provider’s API surface, like
OpenAI or Anthropic. A host defines the wire shape Relay speaks to it.
Models
Models
Catalog entries bound to a host. The
model field in a request resolves
against the catalog; a model is reachable only when it has an enabled
host binding behind it.Host keys
Host keys
Your real upstream provider credentials, held by Relay. Many host keys
for the same host form a pool; Relay spreads traffic across them
and breaks the circuit on any that fail.
Relay keys
Relay keys
The bearer tokens your apps use. A relay key never exposes the
underlying host keys — it’s an indirection you can scope, rate-limit,
and revoke on its own.
Policies
Policies
Rules that decide which models a relay key may reach. Policies are how
you grant one key just
gpt-4o and another the whole catalog.Rate limits
Rate limits
Limits you attach to keys and policies, enforced by Relay before a
request ever leaves for the upstream.
Where to go next
Quickstart
First request through Relay in about two minutes.
Configuration
Every
RELAY_* environment variable and runtime setting.Inference API
Endpoints, wire shapes, streaming, and error codes.
Control API
Manage hosts, keys, policies, and relay keys.