Automatic Anthropic to Vertex AI Failover using Cloudflare AI Gateway
Getting the dreaded overloaded_error when using claude-3-5-sonnet-20241022? Use Cloudflare AI Gateway to automatically failover to Vertex AI!
Simply use the Universal Endpoint provided by Cloudflare AI Gateway when you make your requests, and any failed requests to Anthropic will be rerouted to Google Vertex AI .
curl -v https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \
--header 'Content-Type: application/json' \
--data "[
{
\"provider\": \"anthropic\",
\"endpoint\": \"v1/messages\",
\"headers\": {
\"x-api-key\": \"${ANTHROPIC_API_KEY}\",
\"Content-Type\": \"application/json\",
\"anthropic-version\": \"2023-06-01\"
},
\"query\": {
\"model\": \"claude-3-5-sonnet-20241022\",
\"max_tokens\": 1,
\"messages\": [
{
\"role\": \"user\",
\"content\": \"Hello\"
}
]
}
},
{
\"provider\": \"google-vertex-ai\",
\"endpoint\": \"v1beta1/projects/your-gcp-project-id/locations/us-east5/publishers/anthropic/models/claude-3-5-sonnet-v2@20241022:rawPredict\",
\"headers\": {
\"Authorization\": \"Bearer $(gcloud auth application-default print-access-token)\",
\"Content-Type\": \"application/json\"
},
\"query\": {
\"anthropic_version\": \"vertex-2023-10-16\",
\"max_tokens\": 1,
\"messages\": [
{
\"role\": \"user\",
\"content\": \"Hello\"
}
]
}
}
]"
Example curl request with failover.
If Anthropic is working, you'll get a response from them as usual.
{
"id": "msg_01EDZeiPdKagiRFDCqCqm4xw",
"type": "message",
"role": "assistant",
"model": "claude-3-5-sonnet-20241022",
"content": [
{
"type": "text",
"text": "Hi"
}
],
"stop_reason": "max_tokens",
"stop_sequence": null,
"usage": {
"input_tokens": 8,
"output_tokens": 1
}
}
Example response from Anthropic.
However, whenever the Anthropic endpoint is down (for a api_error
, rate_limit_error
, overloaded_error
, etc), the response will come from Vertex AI instead.
{
"id": "msg_vrtx_01MtqKLXCVA9fXfptstueTcq",
"type": "message",
"role": "assistant",
"model": "claude-3-5-sonnet-v2-20241022",
"content": [
{
"type": "text",
"text": "Hi"
}
],
"stop_reason": "max_tokens",
"stop_sequence": null,
"usage": {
"input_tokens": 8,
"output_tokens": 1
}
}
Example response from Vertex AI.
Congrats, you now have transparent and automatic failover for Claude 3.5 Sonnet v2 (20241022
) requests!