Rate Limiting

The proxy enforces per-tool rate limits using a token-bucket algorithm. Limits can be keyed by client IP, authenticated user, or MCP session ID — independently for each tool.

Configuration

Rate limiting is configured per-upstream under rate_limit. Each entry targets one tool (by name or glob) and defines the bucket parameters:

upstreams:
  - name: myapi
    type: http
    tool_prefix: api
    base_url: https://api.example.com
    openapi:
      source: spec.yaml
    rate_limit:
      - tool: "*"                 # applies to all tools in this upstream
        rate: 60                  # tokens per interval
        interval: 1m              # refill interval
        burst: 10                 # maximum burst above steady rate
        source: user              # key by: ip | user | session | global

Limit sources

SourceKey usedWhen to use
ipClient IP addressUnauthenticated endpoints; coarse throttling
userJWT sub claim or API key identityPer-user quotas on authenticated endpoints
sessionMCP session IDIsolate AI agent sessions from each other
globalShared across all callersProtect an upstream with a hard cap

Per-tool overrides

Use the tool name (with prefix) or a glob to apply limits selectively. More specific patterns take precedence over wildcards:

rate_limit:
  - tool: "*"
    rate: 100
    interval: 1m
    source: user
  - tool: "api__export_data"      # tighter limit for expensive operation
    rate: 5
    interval: 1m
    burst: 2
    source: user

Rate limit responses

When a tool call exceeds its limit the proxy returns an MCP error with code -32029 (rate limit exceeded) and a Retry-After hint in seconds. The MCP client receives a structured error — it is never silently dropped.

Metrics

Rate limit decisions are emitted as OpenTelemetry metrics under the mcp.ratelimit.* namespace. The mcp.ratelimit.rejected counter is labelled by tool name and source key. See OpenTelemetry for the full metrics reference.

See also