Automatic Anthropic to Vertex AI Failover using Cloudflare AI Gateway
Getting the dreaded overloaded_error when using claude-3-5-sonnet-20241022? Use Cloudflare AI Gateway to automatically failover to Vertex AI!
Oct 22, 2024
• 1 min read
Simply use the Universal Endpoint provided by Cloudflare AI Gateway when you make your requests, and any failed requests to Anthropic will be rerouted to Google Vertex AI .
curl -v https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \
--header 'Content-Type: application/json' \
--data "[
{
\"provider\": \"anthropic\",
\"endpoint\": \"v1/messages\",
\"headers\": {
\"x-api-key\": \"${ANTHROPIC_API_KEY}\",
\"Content-Type\": \"application/json\",
\"anthropic-version\": \"2023-06-01\"
},
\"query\": {
\"model\": \"claude-3-5-sonnet-20241022\",
\"max_tokens\": 1,
\"messages\": [
{
\"role\": \"user\",
\"content\": \"Hello\"
}
]
}
},
{
\"provider\": \"google-vertex-ai\",
\"endpoint\": \"v1beta1/projects/your-gcp-project-id/locations/us-east5/publishers/anthropic/models/claude-3-5-sonnet-v2@20241022:rawPredict\",
\"headers\": {
\"Authorization\": \"Bearer $(gcloud auth application-default print-access-token)\",
\"Content-Type\": \"application/json\"
},
\"query\": {
\"anthropic_version\": \"vertex-2023-10-16\",
\"max_tokens\": 1,
\"messages\": [
{
\"role\": \"user\",
\"content\": \"Hello\"
}
]
}
}
]"
Example curl request with failover.
If Anthropic is working, you'll get a response from them as usual.
{
"id": "msg_01EDZeiPdKagiRFDCqCqm4xw",
"type": "message",
"role": "assistant",
"model": "claude-3-5-sonnet-20241022",
"content": [
{
"type": "text",
"text": "Hi"
}
],
"stop_reason": "max_tokens",
"stop_sequence": null,
"usage": {
"input_tokens": 8,
"output_tokens": 1
}
}
Example response from Anthropic.
However, whenever the Anthropic endpoint is down (for a api_error
, rate_limit_error
, overloaded_error
, etc), the response will come from Vertex AI instead.
{
"id": "msg_vrtx_01MtqKLXCVA9fXfptstueTcq",
"type": "message",
"role": "assistant",
"model": "claude-3-5-sonnet-v2-20241022",
"content": [
{
"type": "text",
"text": "Hi"
}
],
"stop_reason": "max_tokens",
"stop_sequence": null,
"usage": {
"input_tokens": 8,
"output_tokens": 1
}
}
Example response from Vertex AI.
Congrats, you now have transparent and automatic failover for Claude 3.5 Sonnet v2 (20241022
) requests!