Local API gateway (serve-api)¶
Point an Anthropic-SDK app at your subscription for local development — with zero code changes.
claude-coder serve-api starts a loopback HTTP server that speaks the Anthropic Messages API wire format. You set one environment variable (ANTHROPIC_BASE_URL) so the official @anthropic-ai/sdk (or anthropic Python SDK) routes every POST /v1/messages call to this server instead of api.anthropic.com. Each request is answered by a fresh, ephemeral claude session on your subscription. Unset the variable in production and your app hits the real API again.
It is a best-effort local dev shim, not a faithful raw-model proxy: under the hood you are talking to the Claude Code agent, not the bare model. See Non-goals.
Authorization & Anthropic's terms
serve-api is for your own local development against your own subscription. It binds loopback only and is single-user by construction. Do not expose it, share it, or use it as a multi-user/metered API replacement — that is outside what the subscription is licensed for. Auth is always your claude subscription login; there is no API-key path.
claude-coder serve-api¶
| Flag | Default | Meaning |
|---|---|---|
--port N |
0 (OS-assigned) |
TCP port to bind on 127.0.0.1. |
--config <path> |
default resolution | Profiles config file. Same resolution as the rest of the CLI. |
There is deliberately no --host flag — the bind is hard-coded to 127.0.0.1. On startup it prints the URL once:
Note
serve-api spawns real sessions, so it fails closed at startup exactly like the rest of the CLI: if any credential or auth-redirect var in the deny set is set in its environment — ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, AWS_BEARER_TOKEN_BEDROCK, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX, CLAUDE_CODE_USE_FOUNDRY — it refuses to start. That is correct — run it in a shell where those are unset (your app keeps them; only the base URL changes). See security for the authoritative deny set.
Use it from your app¶
Start the server in one terminal:
Then point your app's Anthropic SDK at it. No code change — just the base URL:
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({ baseURL: 'http://127.0.0.1:8787', apiKey: 'sk-ignored' });
const msg = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 256,
messages: [{ role: 'user', content: 'Say hello in one word.' }],
});
console.log(msg.content[0].type === 'text' && msg.content[0].text);
Streaming works too — stream: true returns Server-Sent Events in the standard event sequence, so the SDK's streaming helpers parse it without changes.
Endpoints¶
| Method & path | Status |
|---|---|
POST /v1/messages |
Full support — non-streaming JSON and stream: true SSE. |
POST /v1/messages/count_tokens |
Best-effort heuristic estimate (≈ chars / 4) of the prompt. Uses a count-tokens-specific schema, so max_tokens is not required — the SDK's .countTokens() works. |
GET /v1/models |
Lists one entry per configured profile (id = profile name); data: [] if none. |
| anything else | 404 not_found_error. |
How a request maps to a session¶
- The incoming
messages[](plus an optionalsystem) are flattened into one prompt. - The request
modelstring selects a profile: an exact profile-name match wins, otherwise a default profile is used. (Create a profile named e.g.claude-sonnet-4-6to route that model string to a specific config.) - A fresh ephemeral session (
api-<uuid>) is created, the prompt runs as one turn, and the session is torn down when the turn completes — mirroring the stateless API (your client re-sends the full history each call). - The turn result becomes a standard Anthropic
messageobject (content,stop_reason, best-effortusage). Sampling parameters (temperature,top_p,stop_sequences) are accepted and ignored.
Streaming events¶
stream: true emits the canonical sequence so the official SDK accumulator reconstructs the message:
message_start → content_block_start → ping
→ content_block_delta (one per text delta) …
→ content_block_stop → message_delta → message_stop
Error mapping¶
Failures come back in the Anthropic error envelope ({ "type": "error", "error": { "type", "message" } }) with a matching HTTP status:
| Condition | HTTP | error.type |
|---|---|---|
| Malformed/invalid body, unsupported content block | 400 / 422 |
invalid_request_error |
| API key/auth in force at startup | 401 |
authentication_error |
| Non-loopback caller, credential-in-profile | 403 |
permission_error |
| Unknown route | 404 |
not_found_error |
| Body over 4 MiB | 413 |
invalid_request_error |
| At capacity / quota exhausted | 529 |
overloaded_error |
| Other transport faults | 503 |
overloaded_error |
A subscription rate-limit surfacing as a real 429/529 is the most API-faithful behaviour — your client's existing retry/backoff handles it unchanged.
Streaming failures
The status codes above apply to non-streaming requests and to streaming failures that occur before the response is committed. Once a stream: true request has emitted its 200 SSE headers, a later failure (e.g. the session create or turn) can no longer change the HTTP status — it is delivered as an SSE error event ({"type":"error","error":{"type":…}}) on the open stream instead, which the official SDK surfaces as a stream error.
Security posture¶
serve-api is built to be safe to leave running on your machine.
- Loopback only. It binds
127.0.0.1; a non-loopback peer is answered with403and its socket destroyed before any body is read. There is no--host. - No API-key path. Auth is always your
claudesubscription login. The inboundx-api-key/Authorizationheader is never forwarded to the fleet, the child process, its environment, or any log — it is read once and added to the per-request redaction set only, so if a model echoes it back it is scrubbed (both the bare token and aBearer <token>form). - Outbound redaction chokepoint. Every response body and every SSE chunk passes through the secret-scrubbing redactor; error messages are reduced to a curated, allow-listed safe string (no stack traces,
cause, class names, orprocess.env). - No leaked sessions. Each request owns one ephemeral session, closed in a
finally. A mid-stream client disconnect aborts the request and closes the session; a stalled turn is bounded by a per-request deadline; an at-capacity create fast-fails to529rather than hanging.
Non-goals¶
serve-api is a local dev shim. It does not aim to be a drop-in for these:
- Client-defined tool use.
tool_use/tool_resultcontent blocks are rejected (422); Claude Code's own tool prompts are auto-denied. Plain text + system prompts are the supported path. This was evaluated in depth and closed (2026-06, including the in-process-MCP-bridge idea): three structural ceilings make a bridge impossible without redesigning the gateway — (1) the fleet facade's serializable contract has no per-request MCP injection (servers are injected only at session spawn via a profile'smcpConfig); (2) the Messages API is fully stateless, so a follow-uptool_resultrequest has no way to address a parked session without an off-contract header that real Anthropic SDK clients never send; (3) the wire codec is text-only (notool_useblock/stop_reason path) and the drain loop auto-denies every tool prompt. A partial bridge that passes locally but fails against the real API would be worse than the clear422. If you need tool execution in local dev, wire an MCP server into the profile'smcpConfig— the agent calls it server-side and returns plain text, so notool_useblocks ever cross the wire. - Raw-model fidelity. Responses come from the Claude Code agent, not the bare model; sampling params, exact token usage, and dated model snapshots are not reproduced.
- Multi-user / shareable / metered API replacement. Loopback-only, single-user.
- Prompt caching, batch, files, citations, images/documents in
messages[].
For the in-process fleet API, see library.