Rate Limiting
The proxy enforces per-tool rate limits using a token-bucket algorithm. Limits can be keyed by client IP, authenticated user, or MCP session ID — independently for each tool.
Configuration
Rate limiting is configured per-upstream under rate_limit. Each entry targets one tool (by name or glob) and defines the bucket parameters:
upstreams:
- name: myapi
type: http
tool_prefix: api
base_url: https://api.example.com
openapi:
source: spec.yaml
rate_limit:
- tool: "*" # applies to all tools in this upstream
rate: 60 # tokens per interval
interval: 1m # refill interval
burst: 10 # maximum burst above steady rate
source: user # key by: ip | user | session | global Limit sources
| Source | Key used | When to use |
|---|---|---|
ip | Client IP address | Unauthenticated endpoints; coarse throttling |
user | JWT sub claim or API key identity | Per-user quotas on authenticated endpoints |
session | MCP session ID | Isolate AI agent sessions from each other |
global | Shared across all callers | Protect an upstream with a hard cap |
Per-tool overrides
Use the tool name (with prefix) or a glob to apply limits selectively. More specific patterns take precedence over wildcards:
rate_limit:
- tool: "*"
rate: 100
interval: 1m
source: user
- tool: "api__export_data" # tighter limit for expensive operation
rate: 5
interval: 1m
burst: 2
source: user Rate limit responses
When a tool call exceeds its limit the proxy returns an MCP error with code -32029 (rate limit exceeded) and a Retry-After hint in seconds. The MCP client receives a structured error — it is never silently dropped.
Metrics
Rate limit decisions are emitted as OpenTelemetry metrics under the mcp.ratelimit.* namespace. The mcp.ratelimit.rejected counter is labelled by tool name and source key. See OpenTelemetry for the full metrics reference.
See also
- Circuit Breaking — upstream failure protection
- OpenTelemetry — metrics and traces
- Authentication — user identity for
user-keyed limits