Rate Limiting

The proxy enforces per-tool rate limits using a token-bucket algorithm. Limits can be keyed by client IP, authenticated user, or MCP session ID — independently for each tool.

Configuration

Rate limiting is configured per-upstream under rate_limit. Each entry targets one tool (by name or glob) and defines the bucket parameters:

upstreams:
  - name: myapi
    type: http
    tool_prefix: api
    base_url: https://api.example.com
    openapi:
      source: spec.yaml
    rate_limit:
      - tool: "*"                 # applies to all tools in this upstream
        rate: 60                  # tokens per interval
        interval: 1m              # refill interval
        burst: 10                 # maximum burst above steady rate
        source: user              # key by: ip | user | session | global

Limit sources

Source	Key used	When to use
`ip`	Client IP address	Unauthenticated endpoints; coarse throttling
`user`	JWT `sub` claim or API key identity	Per-user quotas on authenticated endpoints
`session`	MCP session ID	Isolate AI agent sessions from each other
`global`	Shared across all callers	Protect an upstream with a hard cap

Per-tool overrides

Use the tool name (with prefix) or a glob to apply limits selectively. More specific patterns take precedence over wildcards:

rate_limit:
  - tool: "*"
    rate: 100
    interval: 1m
    source: user
  - tool: "api__export_data"      # tighter limit for expensive operation
    rate: 5
    interval: 1m
    burst: 2
    source: user

Rate limit responses

When a tool call exceeds its limit the proxy returns an MCP error with code -32029 (rate limit exceeded) and a Retry-After hint in seconds. The MCP client receives a structured error — it is never silently dropped.

Metrics