Optimizing Claude MCP Server Usage: Leveraging Prompt Caching
MCP servers are great, but can be very costly on input tokens when Claude has to make multiple calls. Multiple calls are billed the same as if you make another API request.
Let's use this as an example request to begin:
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1000,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "List all the Linux kernel arguments."
}
]
}
],
"mcp_servers": [
{
"type": "url",
"url": "https://mcp.deepwiki.com/sse",
"name": "DeepWiki"
}
]
}
Request body without caching enabled
In our response, we can see there's four tool calls happening.
{
"id": "msg_013MQCyvkhCFrx9Cn5CriahH",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4-20250514",
"content": [
{
"type": "text",
"text": "I'll help you... <REMOVED_FOR_BLOG_LENGTH>"
},
{
"type": "mcp_tool_use",
"id": "mcptoolu_0198sXxscecG6ECQ367Bbftp",
"name": "ask_question",
"input": {
"repoName": "torvalds/linux",
"question": "What are all... <REMOVED_FOR_BLOG_LENGTH>"
},
"server_name": "DeepWiki"
},
{
"type": "mcp_tool_result",
"tool_use_id": "mcptoolu_0198sXxscecG6ECQ367Bbftp",
"is_error": false,
"content": [
{
"type": "text",
"text": "It appears you... <REMOVED_FOR_BLOG_LENGTH>"
}
]
},
{
"type": "text",
"text": "Let me try a more... <REMOVED_FOR_BLOG_LENGTH>"
},
{
"type": "mcp_tool_use",
"id": "mcptoolu_01TvGoXPMgRCapDFSReNvcRo",
"name": "ask_question",
"input": {
"repoName": "torvalds/linux",
"question": "Where is the documentation... <REMOVED_FOR_BLOG_LENGTH>"
},
"server_name": "DeepWiki"
},
{
"type": "mcp_tool_result",
"tool_use_id": "mcptoolu_01TvGoXPMgRCapDFSReNvcRo",
"is_error": false,
"content": [
{
"type": "text",
"text": "The documentation... <REMOVED_FOR_BLOG_LENGTH>"
}
]
},
{
"type": "text",
"text": "Now let me get... <REMOVED_FOR_BLOG_LENGTH>"
},
{
"type": "mcp_tool_use",
"id": "mcptoolu_011KQQt3Xa2EYHYPveBx9roW",
"name": "ask_question",
"input": {
"repoName": "torvalds/linux",
"question": "Show... <REMOVED_FOR_BLOG_LENGTH>"
},
"server_name": "DeepWiki"
},
{
"type": "mcp_tool_result",
"tool_use_id": "mcptoolu_011KQQt3Xa2EYHYPveBx9roW",
"is_error": false,
"content": [
{
"type": "text",
"text": "I cannot provide... <REMOVED_FOR_BLOG_LENGTH>"
}
]
},
{
"type": "text",
"text": "Let me try... <REMOVED_FOR_BLOG_LENGTH>"
},
{
"type": "mcp_tool_use",
"id": "mcptoolu_014RqVvjACG3BDxRzfgMjFY7",
"name": "ask_question",
"input": {
"repoName": "torvalds/linux",
"question": "What are the main... <REMOVED_FOR_BLOG_LENGTH>"
},
"server_name": "DeepWiki"
},
{
"type": "mcp_tool_result",
"tool_use_id": "mcptoolu_014RqVvjACG3BDxRzfgMjFY7",
"is_error": false,
"content": [
{
"type": "text",
"text": "The user is asking about... <REMOVED_FOR_BLOG_LENGTH>"
}
]
},
{
"type": "text",
"text": "Based on my search of the Linux kernel repository... <REMOVED_FOR_BLOG_LENGTH>"
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 7356,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 1436
}
}
Response body without caching enabled
We don't really care about the actual answer, let's focus on the usage instead:
{
"input_tokens": 7356,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 1436
}
Response usage without caching enabled
Ouch, we end up being billed for over 7,000 input tokens at full price, because we aren't caching anything.
While not explicitly documented, MCP servers are really just tools behind the scenes (even for Anthropic models). This means we can use prompt caching just like "normal" tools can be cached!
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1000,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "List all the Linux kernel arguments.",
"cache_control": {
"type": "ephemeral"
}
}
]
}
],
"mcp_servers": [
{
"type": "url",
"url": "https://mcp.deepwiki.com/sse",
"name": "DeepWiki"
}
]
}
Request body with caching enabled
How's our usage now?
{
"input_tokens": 1744,
"cache_creation_input_tokens": 3218,
"cache_read_input_tokens": 7489,
"output_tokens": 1331
}
Response usage with caching enabled
Much better. On just one prompt, we went from paying $0.0220 without caching down to $0.0195 on input tokens with caching!
7356*(3/1000000)+0*(3.75/1000000)+0*(0.30/1000000)
Cost without prompt caching
1744*(3/1000000)+3218*(3.75/1000000)+7349*(0.30/1000000)
Cost with prompt caching
In most situations, you should see a significant cost saving if you remember to add prompt caching whenever you're using remote MCP servers, even if you are only sending one API request.
We did notice that enabling caching like this can sometimes result in a high cache_creation_input_tokens
; we'll look into this in a future blog post.