Docs

API Documentation

OpenAI-compatible API protocol. Switch by changing only one base URL line.

Introduction

Infer Mesh is an OpenAI-compatible AI model aggregation proxy. With one unified API endpoint, you can access models from Anthropic, OpenAI, Google, DeepSeek, and more without applying for separate API keys.

Full Compatibility

Works with OpenAI SDK with zero code changes

Automatic Failover

Automatically switches to backup nodes when upstream is unstable

Transparent Billing

Usage-based token billing with public pricing

Quick Start

Start in two steps: create an API key, then point requests to our endpoint.

Python (Recommended)

bash
pip install openai
python
from openai import OpenAI

client = OpenAI(
    base_url="https://infermesh.io/v1",
    api_key="your-api-key",   # create in console
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",   # or any supported model ID
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Node.js / TypeScript

typescript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://infermesh.io/v1",
  apiKey: "your-api-key",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

cURL

bash
curl https://infermesh.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "deepseek-v3",
    "messages": [
            {"role": "user", "content": "Hello!"}
    ]
  }'

Authentication

All requests must include an API key in HTTP headers. After creating one in theConsole, add it to your request header:

bash
Authorization: Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxx

Never commit API keys to public repositories or expose them in frontend code. If leaked, revoke and regenerate immediately in the console.

Chat Completions

POST/v1/chat/completions

Create a chat completion request compatible with OpenAI Chat Completions API.

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID, e.g. claude-sonnet-4-6
messagesarrayYesConversation message array
streambooleanNoEnable streaming output, default false
max_tokensintegerNoMaximum output token count
temperaturenumberNoSampling temperature, 0.0 - 2.0
toolsarrayNoFunction calling tool definitions

Embeddings

POST/v1/embeddings

Convert text into vector embeddings for semantic search, similarity matching, and RAG.

Request Parameters

ParameterTypeRequiredDescription
modelstringYesEmbedding model ID, e.g. text-embedding-3-small
inputstring | string[]YesText to vectorize, supports single or batch input
encoding_formatstringNoResponse format, commonly float
bash
curl https://infermesh.io/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "text-embedding-3-small",
        "input": "hello, vector search test"
  }'

Models List

GET/v1/models

Get the list of all available models.

bash
curl https://infermesh.io/v1/models \
  -H "Authorization: Bearer your-api-key"

Responses

POST/v1/responses

OpenAI Responses-compatible endpoint supporting both one-shot and streaming output.

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID
inputstring | object | arrayNoInput content (commonly a string)
streambooleanNoWhether to return streaming output, default false
max_output_tokensintegerNoMaximum output token count
bash
curl https://infermesh.io/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-4o",
        "input": "Introduce yourself in one sentence.",
    "stream": false
  }'

Messages (Anthropic)

POST/v1/messages

Anthropic Messages API-compatible endpoint. Supports Claude Code and other Anthropic-native clients. Use the x-api-key header (or Authorization: Bearer) for authentication.

The x-api-key header takes priority. Authorization: Bearer is also accepted for compatibility.

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID, e.g. claude-sonnet-4-6
messagesarrayYesConversation messages array (Anthropic format)
systemstringNoSystem prompt (optional)
max_tokensintegerNoMaximum output token count
streambooleanNoEnable streaming output, default false
bash
curl https://infermesh.io/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'
python
import anthropic

client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://infermesh.io",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude!"}],
)

print(message.content[0].text)

Streaming

Set stream: true to enable SSE streaming responses for real-time rendering.

python
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
                        messages=[{"role": "user", "content": "Write a short poem about AI."}],
    stream=True,
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Function Calling

Supports OpenAI-style Function Calling (Tool Use) for integrating external tools and data sources.

python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get current weather for a specified city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'Seattle'",
                    }
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Beijing today?"}],
    tools=tools,
)

Claude Code

Claude Code is Anthropic's official agentic coding tool. You can configure it to use Infer Mesh as the API proxy by setting the following environment variables in your Claude Code settings file (~/.claude/settings.json):

json
{
  "autoUpdatesChannel": "latest",
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "your-api-key",
    "ANTHROPIC_BASE_URL": "https://infermesh.io",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4.5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4.6",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4.6"
  }
}

With the above configuration, Claude Code will send all requests through Infer Mesh, enabling access control, usage tracking, and multi-model routing.

Automatic Failover

When upstream providers error or timeout, the system automatically retries on other available nodes to keep service continuity. You can specify primary and fallback models in the request body:

bash
# Set fallback models in request body (priority order)
{
  "model": "claude-opus-4-6",
  "fallback_models": ["gpt-4o", "gemini-3-1-pro"],
  "messages": [...]
}

Load Balancing

For high concurrency, enable load balancing to distribute requests across backend nodes and reduce latency.

Rate Limits

There are no rate limits or account tier distinctions. As long as your account has a balance, you can use the API without any restrictions.