Skip to content

Local API gateway (serve-api)

Point an Anthropic-SDK app at your subscription for local development — with zero code changes.

claude-coder serve-api starts a loopback HTTP server that speaks the Anthropic Messages API wire format. You set one environment variable (ANTHROPIC_BASE_URL) so the official @anthropic-ai/sdk (or anthropic Python SDK) routes every POST /v1/messages call to this server instead of api.anthropic.com. Each request is answered by a fresh, ephemeral claude session on your subscription. Unset the variable in production and your app hits the real API again.

It is a best-effort local dev shim, not a faithful raw-model proxy: under the hood you are talking to the Claude Code agent, not the bare model. See Non-goals.

Authorization & Anthropic's terms

serve-api is for your own local development against your own subscription. It binds loopback only and is single-user by construction. Do not expose it, share it, or use it as a multi-user/metered API replacement — that is outside what the subscription is licensed for. Auth is always your claude subscription login; there is no API-key path.

claude-coder serve-api

claude-coder serve-api [--port N] [--config <path>]
Flag Default Meaning
--port N 0 (OS-assigned) TCP port to bind on 127.0.0.1.
--config <path> default resolution Profiles config file. Same resolution as the rest of the CLI.

There is deliberately no --host flag — the bind is hard-coded to 127.0.0.1. On startup it prints the URL once:

claude-coder api listening on http://127.0.0.1:8787

Note

serve-api spawns real sessions, so it fails closed at startup exactly like the rest of the CLI: if any credential or auth-redirect var in the deny set is set in its environment — ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, AWS_BEARER_TOKEN_BEDROCK, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX, CLAUDE_CODE_USE_FOUNDRY — it refuses to start. That is correct — run it in a shell where those are unset (your app keeps them; only the base URL changes). See security for the authoritative deny set.

Use it from your app

Start the server in one terminal:

claude-coder serve-api --port 8787

Then point your app's Anthropic SDK at it. No code change — just the base URL:

export ANTHROPIC_BASE_URL=http://127.0.0.1:8787
export ANTHROPIC_API_KEY=sk-ignored   # required by the SDK; ignored by the gateway
python your_app.py
from anthropic import Anthropic
client = Anthropic()                       # picks up ANTHROPIC_BASE_URL
msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Say hello in one word."}],
)
print(msg.content[0].text)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({ baseURL: 'http://127.0.0.1:8787', apiKey: 'sk-ignored' });
const msg = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 256,
  messages: [{ role: 'user', content: 'Say hello in one word.' }],
});
console.log(msg.content[0].type === 'text' && msg.content[0].text);
curl -s http://127.0.0.1:8787/v1/messages \
  -H 'content-type: application/json' \
  -H 'anthropic-version: 2023-06-01' \
  -d '{"model":"claude-sonnet-4-6","max_tokens":64,
       "messages":[{"role":"user","content":"Say hello in one word."}]}'

Streaming works too — stream: true returns Server-Sent Events in the standard event sequence, so the SDK's streaming helpers parse it without changes.

Endpoints

Method & path Status
POST /v1/messages Full support — non-streaming JSON and stream: true SSE.
POST /v1/messages/count_tokens Best-effort heuristic estimate (≈ chars / 4) of the prompt. Uses a count-tokens-specific schema, so max_tokens is not required — the SDK's .countTokens() works.
GET /v1/models Lists one entry per configured profile (id = profile name); data: [] if none.
anything else 404 not_found_error.

How a request maps to a session

  1. The incoming messages[] (plus an optional system) are flattened into one prompt.
  2. The request model string selects a profile: an exact profile-name match wins, otherwise a default profile is used. (Create a profile named e.g. claude-sonnet-4-6 to route that model string to a specific config.)
  3. A fresh ephemeral session (api-<uuid>) is created, the prompt runs as one turn, and the session is torn down when the turn completes — mirroring the stateless API (your client re-sends the full history each call).
  4. The turn result becomes a standard Anthropic message object (content, stop_reason, best-effort usage). Sampling parameters (temperature, top_p, stop_sequences) are accepted and ignored.

Streaming events

stream: true emits the canonical sequence so the official SDK accumulator reconstructs the message:

message_start → content_block_start → ping
   → content_block_delta (one per text delta) …
   → content_block_stop → message_delta → message_stop

Error mapping

Failures come back in the Anthropic error envelope ({ "type": "error", "error": { "type", "message" } }) with a matching HTTP status:

Condition HTTP error.type
Malformed/invalid body, unsupported content block 400 / 422 invalid_request_error
API key/auth in force at startup 401 authentication_error
Non-loopback caller, credential-in-profile 403 permission_error
Unknown route 404 not_found_error
Body over 4 MiB 413 invalid_request_error
At capacity / quota exhausted 529 overloaded_error
Other transport faults 503 overloaded_error

A subscription rate-limit surfacing as a real 429/529 is the most API-faithful behaviour — your client's existing retry/backoff handles it unchanged.

Streaming failures

The status codes above apply to non-streaming requests and to streaming failures that occur before the response is committed. Once a stream: true request has emitted its 200 SSE headers, a later failure (e.g. the session create or turn) can no longer change the HTTP status — it is delivered as an SSE error event ({"type":"error","error":{"type":…}}) on the open stream instead, which the official SDK surfaces as a stream error.

Security posture

serve-api is built to be safe to leave running on your machine.

  • Loopback only. It binds 127.0.0.1; a non-loopback peer is answered with 403 and its socket destroyed before any body is read. There is no --host.
  • No API-key path. Auth is always your claude subscription login. The inbound x-api-key / Authorization header is never forwarded to the fleet, the child process, its environment, or any log — it is read once and added to the per-request redaction set only, so if a model echoes it back it is scrubbed (both the bare token and a Bearer <token> form).
  • Outbound redaction chokepoint. Every response body and every SSE chunk passes through the secret-scrubbing redactor; error messages are reduced to a curated, allow-listed safe string (no stack traces, cause, class names, or process.env).
  • No leaked sessions. Each request owns one ephemeral session, closed in a finally. A mid-stream client disconnect aborts the request and closes the session; a stalled turn is bounded by a per-request deadline; an at-capacity create fast-fails to 529 rather than hanging.

Non-goals

serve-api is a local dev shim. It does not aim to be a drop-in for these:

  • Client-defined tool use. tool_use / tool_result content blocks are rejected (422); Claude Code's own tool prompts are auto-denied. Plain text + system prompts are the supported path. This was evaluated in depth and closed (2026-06, including the in-process-MCP-bridge idea): three structural ceilings make a bridge impossible without redesigning the gateway — (1) the fleet facade's serializable contract has no per-request MCP injection (servers are injected only at session spawn via a profile's mcpConfig); (2) the Messages API is fully stateless, so a follow-up tool_result request has no way to address a parked session without an off-contract header that real Anthropic SDK clients never send; (3) the wire codec is text-only (no tool_use block/stop_reason path) and the drain loop auto-denies every tool prompt. A partial bridge that passes locally but fails against the real API would be worse than the clear 422. If you need tool execution in local dev, wire an MCP server into the profile's mcpConfig — the agent calls it server-side and returns plain text, so no tool_use blocks ever cross the wire.
  • Raw-model fidelity. Responses come from the Claude Code agent, not the bare model; sampling params, exact token usage, and dated model snapshots are not reproduced.
  • Multi-user / shareable / metered API replacement. Loopback-only, single-user.
  • Prompt caching, batch, files, citations, images/documents in messages[].

For the in-process fleet API, see library.