Tool Result Caching
Tool result caching stores upstream responses in memory and serves them directly for repeat calls within the TTL window — reducing latency and upstream API load for read-heavy tool sets.
Configuration
Caching is configured per-upstream. All tools in the upstream share the cache store unless overridden per-tool via an overlay.
upstreams:
- name: myapi
type: http
tool_prefix: api
base_url: https://api.example.com
openapi:
source: spec.yaml
cache:
enabled: true
ttl: 60s # how long to keep a cached result
per_user: false # if true, cache is keyed per authenticated user
max_size: 1000 # maximum number of entries (LRU eviction) Per-tool TTL override
Override TTL for individual operations using the x-mcp-cache-ttl extension in an overlay:
# overlay.yaml
- target: "$.paths[\"/ticker\"].get"
update:
x-mcp-cache-ttl: 5s # high-frequency data: cache only briefly
- target: "$.paths[\"/instruments\"].get"
update:
x-mcp-cache-ttl: 1h # slow-changing reference data: cache longer Disabling cache for a tool
- target: "$.paths[\"/account/balance\"].get"
update:
x-mcp-cache-ttl: 0 # 0 disables caching for this operation Per-user caching
When per_user: true the cache key includes the authenticated user's identity (JWT sub claim or API key identity). Users never see each other's cached data. This requires inbound auth to be configured. See Authentication.
Cache invalidation
Entries expire after their TTL. The entire cache for an upstream is flushed whenever the upstream's OpenAPI spec is refreshed (background spec refresh) or when the proxy hot-reloads its config.
Metrics
Cache hits and misses are tracked under mcp.cache.hits and mcp.cache.misses OpenTelemetry counters, labelled by tool name. See OpenTelemetry.
See also
- OpenAPI Overlays — per-operation cache TTL
- Hot-Reload & Spec Refresh — triggers cache invalidation
- Token Counting — monitor response sizes