API Documentation
OpenAI-compatible API protocol. Switch by changing only one base URL line.
Introduction
Infer Mesh is an OpenAI-compatible AI model aggregation proxy. With one unified API endpoint, you can access models from Anthropic, OpenAI, Google, DeepSeek, and more without applying for separate API keys.
Full Compatibility
Works with OpenAI SDK with zero code changes
Automatic Failover
Automatically switches to backup nodes when upstream is unstable
Transparent Billing
Usage-based token billing with public pricing
Quick Start
Start in two steps: create an API key, then point requests to our endpoint.
Python (Recommended)
pip install openaifrom openai import OpenAI
client = OpenAI(
base_url="https://infermesh.io/v1",
api_key="your-api-key", # create in console
)
response = client.chat.completions.create(
model="claude-sonnet-4-6", # or any supported model ID
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Node.js / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://infermesh.io/v1",
apiKey: "your-api-key",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);cURL
curl https://infermesh.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "deepseek-v3",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Authentication
All requests must include an API key in HTTP headers. After creating one in theConsole, add it to your request header:
Authorization: Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxxNever commit API keys to public repositories or expose them in frontend code. If leaked, revoke and regenerate immediately in the console.
Chat Completions
/v1/chat/completionsCreate a chat completion request compatible with OpenAI Chat Completions API.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID, e.g. claude-sonnet-4-6 |
messages | array | Yes | Conversation message array |
stream | boolean | No | Enable streaming output, default false |
max_tokens | integer | No | Maximum output token count |
temperature | number | No | Sampling temperature, 0.0 - 2.0 |
tools | array | No | Function calling tool definitions |
Embeddings
/v1/embeddingsConvert text into vector embeddings for semantic search, similarity matching, and RAG.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Embedding model ID, e.g. text-embedding-3-small |
input | string | string[] | Yes | Text to vectorize, supports single or batch input |
encoding_format | string | No | Response format, commonly float |
curl https://infermesh.io/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "text-embedding-3-small",
"input": "hello, vector search test"
}'Models List
/v1/modelsGet the list of all available models.
curl https://infermesh.io/v1/models \
-H "Authorization: Bearer your-api-key"Responses
/v1/responsesOpenAI Responses-compatible endpoint supporting both one-shot and streaming output.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID |
input | string | object | array | No | Input content (commonly a string) |
stream | boolean | No | Whether to return streaming output, default false |
max_output_tokens | integer | No | Maximum output token count |
curl https://infermesh.io/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "gpt-4o",
"input": "Introduce yourself in one sentence.",
"stream": false
}'Messages (Anthropic)
/v1/messagesAnthropic Messages API-compatible endpoint. Supports Claude Code and other Anthropic-native clients. Use the x-api-key header (or Authorization: Bearer) for authentication.
The x-api-key header takes priority. Authorization: Bearer is also accepted for compatibility.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID, e.g. claude-sonnet-4-6 |
messages | array | Yes | Conversation messages array (Anthropic format) |
system | string | No | System prompt (optional) |
max_tokens | integer | No | Maximum output token count |
stream | boolean | No | Enable streaming output, default false |
curl https://infermesh.io/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'import anthropic
client = anthropic.Anthropic(
api_key="your-api-key",
base_url="https://infermesh.io",
)
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}],
)
print(message.content[0].text)Streaming
Set stream: true to enable SSE streaming responses for real-time rendering.
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Write a short poem about AI."}],
stream=True,
)
for chunk in response:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)Function Calling
Supports OpenAI-style Function Calling (Tool Use) for integrating external tools and data sources.
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get current weather for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'Seattle'",
}
},
"required": ["city"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Beijing today?"}],
tools=tools,
)Claude Code
Claude Code is Anthropic's official agentic coding tool. You can configure it to use Infer Mesh as the API proxy by setting the following environment variables in your Claude Code settings file (~/.claude/settings.json):
{
"autoUpdatesChannel": "latest",
"env": {
"ANTHROPIC_AUTH_TOKEN": "your-api-key",
"ANTHROPIC_BASE_URL": "https://infermesh.io",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4.5",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4.6",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4.6"
}
}With the above configuration, Claude Code will send all requests through Infer Mesh, enabling access control, usage tracking, and multi-model routing.
Automatic Failover
When upstream providers error or timeout, the system automatically retries on other available nodes to keep service continuity. You can specify primary and fallback models in the request body:
# Set fallback models in request body (priority order)
{
"model": "claude-opus-4-6",
"fallback_models": ["gpt-4o", "gemini-3-1-pro"],
"messages": [...]
}Load Balancing
For high concurrency, enable load balancing to distribute requests across backend nodes and reduce latency.
Rate Limits
There are no rate limits or account tier distinctions. As long as your account has a balance, you can use the API without any restrictions.