Docs

API Documentation

OpenAI-compatible API protocol. Switch by changing only one base URL line.

Introduction

Infer Mesh is an OpenAI-compatible AI model aggregation proxy. With one unified API endpoint, you can access models from Anthropic, OpenAI, Google, DeepSeek, and more without applying for separate API keys.

Full Compatibility

Works with OpenAI SDK with zero code changes

Automatic Failover

Automatically switches to backup nodes when upstream is unstable

Transparent Billing

Usage-based token billing with public pricing

Quick Start

Start in two steps: create an API key, then point requests to our endpoint.

Python (Recommended)

bash

pip install openai

python

from openai import OpenAI

client = OpenAI(
    base_url="https://infermesh.io/v1",
    api_key="your-api-key",   # create in console
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",   # or any supported model ID
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Node.js / TypeScript

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://infermesh.io/v1",
  apiKey: "your-api-key",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

cURL

bash

curl https://infermesh.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "deepseek-v3",
    "messages": [
            {"role": "user", "content": "Hello!"}
    ]
  }'

Authentication

All requests must include an API key in HTTP headers. After creating one in theConsole, add it to your request header:

bash

Authorization: Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxx

Never commit API keys to public repositories or expose them in frontend code. If leaked, revoke and regenerate immediately in the console.

Chat Completions

POST/v1/chat/completions

Create a chat completion request compatible with OpenAI Chat Completions API.

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID, e.g. claude-sonnet-4-6
`messages`	array	Yes	Conversation message array
`stream`	boolean	No	Enable streaming output, default false
`max_tokens`	integer	No	Maximum output token count
`temperature`	number	No	Sampling temperature, 0.0 - 2.0
`tools`	array	No	Function calling tool definitions

Embeddings

POST/v1/embeddings

Convert text into vector embeddings for semantic search, similarity matching, and RAG.

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Embedding model ID, e.g. text-embedding-3-small
`input`	string \| string[]	Yes	Text to vectorize, supports single or batch input
`encoding_format`	string	No	Response format, commonly float

bash

curl https://infermesh.io/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "text-embedding-3-small",
        "input": "hello, vector search test"
  }'

Models List

GET/v1/models

Get the list of all available models.

bash

curl https://infermesh.io/v1/models \
  -H "Authorization: Bearer your-api-key"

Responses

POST/v1/responses

OpenAI Responses-compatible endpoint supporting both one-shot and streaming output.

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID
`input`	string \| object \| array	No	Input content (commonly a string)
`stream`	boolean	No	Whether to return streaming output, default false
`max_output_tokens`	integer	No	Maximum output token count

bash

curl https://infermesh.io/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-4o",
        "input": "Introduce yourself in one sentence.",
    "stream": false
  }'

Messages (Anthropic)

POST/v1/messages

Anthropic Messages API-compatible endpoint. Supports Claude Code and other Anthropic-native clients. Use the x-api-key header (or Authorization: Bearer) for authentication.

The x-api-key header takes priority. Authorization: Bearer is also accepted for compatibility.

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID, e.g. claude-sonnet-4-6
`messages`	array	Yes	Conversation messages array (Anthropic format)
`system`	string	No	System prompt (optional)
`max_tokens`	integer	No	Maximum output token count
`stream`	boolean	No	Enable streaming output, default false

bash

curl https://infermesh.io/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

python

import anthropic

client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://infermesh.io",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude!"}],
)

print(message.content[0].text)

Streaming

Set stream: true to enable SSE streaming responses for real-time rendering.

python

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
                        messages=[{"role": "user", "content": "Write a short poem about AI."}],
    stream=True,
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Function Calling

Supports OpenAI-style Function Calling (Tool Use) for integrating external tools and data sources.

python

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get current weather for a specified city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'Seattle'",
                    }
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Beijing today?"}],
    tools=tools,
)

Claude Code

Claude Code is Anthropic's official agentic coding tool. You can configure it to use Infer Mesh as the API proxy by setting the following environment variables in your Claude Code settings file (~/.claude/settings.json):

json

{
  "autoUpdatesChannel": "latest",
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "your-api-key",
    "ANTHROPIC_BASE_URL": "https://infermesh.io",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4.5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4.6",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4.6"
  }
}

With the above configuration, Claude Code will send all requests through Infer Mesh, enabling access control, usage tracking, and multi-model routing.

Automatic Failover

When upstream providers error or timeout, the system automatically retries on other available nodes to keep service continuity. You can specify primary and fallback models in the request body:

bash

# Set fallback models in request body (priority order)
{
  "model": "claude-opus-4-6",
  "fallback_models": ["gpt-4o", "gemini-3-1-pro"],
  "messages": [...]
}

Load Balancing

For high concurrency, enable load balancing to distribute requests across backend nodes and reduce latency.

Rate Limits

There are no rate limits or account tier distinctions. As long as your account has a balance, you can use the API without any restrictions.