Collecting Anthropic Server Logs with an Undocumented API

If you read our previous post on collecting Vertex AI logs, a great question would be: is Claude 3.5 Sonnet on Vertex AI more or less reliable than Anthropic's API in our production stack?

Unfortunately, Anthropic does not (yet) expose a documented API for us to use. But we can use an undocumented one in the meantime!

First, you'll need to collect and save the logs. You can download the logs (...individually) by visiting the URL below. This is the part that sucks.

https://api.console.anthropic.com/api/logs/YOUR_ANTHROPIC_ORGANIZATION_ID_HERE?page_size=100&page_index=0

API Endpoint for Anthropic Server Logs

You'll need to replace YOUR_ANTHROPIC_ORGANIZATION_ID_HERE with your organization ID from Anthropic, which you can find here.

Next, increment page_index a few times to save however many recent logs you'd like to analyze. While it would be nice to script this up, Anthropic has some anti-bot WAF in front of this endpoint, and I found it frustrating to deal with.

Once we have all the files saved in a folder called logs/, let's create a script to combine them all together, along with deduplication!

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

import os
import glob
import json
from datetime import datetime

logs_dir = "logs/"
logs_out_dir = "logs_out/"

# Ensure the output directory exists
if not os.path.exists(logs_out_dir):
    os.makedirs(logs_out_dir)

# Get all JSON files in the logs directory
log_files = glob.glob(os.path.join(logs_dir, "*.json"))

all_entries = []

# Read and combine entries from all log files
for log_file in log_files:
    with open(log_file, "r") as f:
        data = json.load(f)
    all_entries.extend(data)

# Deduplicate the log entries
log_set = set()
deduped_logs = []
for entry in all_entries:
    # Serialize the entry to a JSON string with sorted keys for consistent hashing
    entry_str = json.dumps(entry, sort_keys=True)
    if entry_str not in log_set:
        log_set.add(entry_str)
        deduped_logs.append(entry)

# Sort the logs by 'request_start_time' from newest to oldest
deduped_logs.sort(
    key=lambda x: datetime.strptime(x["request_start_time"], "%Y-%m-%d %H:%M:%S.%f%z"),
    reverse=True,
)

# Write the deduplicated and sorted logs to the output file
output_file = os.path.join(logs_out_dir, "combined_logs.json")
with open(output_file, "w") as f:
    json.dump(deduped_logs, f, indent=2)

Script to deduplicate and combine logs.

Awesome, we now have a file called combined_logs.json we can analyze! If we want to look at all the requests that had an error for Claude 3.5 Sonnet, we can run the command below.

jq '[.[] | select(.model == "claude-3-5-sonnet-20240620" and .error != "None")]' logs_out/combined_logs.json
[
  {
    "transport_type": "http",
    "model": "claude-3-5-sonnet-20240620",
    "model_latency": 2.0138745307922363,
    "prompt": null,
    "prompt_length": 0,
    "prompt_token_count": 0,
    "completion_length": 0,
    "completion_token_count": 0,
    "response": null,
    "tags": "{}",
    "error": "{\"client_error\":false,\"code\":500,\"detail\":\"Internal server error\"}",
    "request_start_time": "2024-10-09 05:09:38.284508+00:00",
    "request": "{\"max_tokens\":8192,\"metadata\":{\"user_id\":\"ai.moda-dev-5f01134f8c259dada84776ca175ee6e3d2234751642dec0fc48c9f582a253add\"},\"model\":\"claude-3-5-sonnet-20240620\",\"temperature\":0}",
    "workspace_id": "wrkspc_01AQ99UjK71nSprtyvYEvaXa",
    "api_key_uuid": null
  }
]

Example error log.

Neat! This log isn't an entirely fair comparison to Vertex AI, since we were using the new Batches API right as the beta began, so errors here should be expected. In our look at the past ~5,000 log entries, we surprisingly have no other errors!

If we want to do something like sort by latency, that's totally doable too.

jq '[.[] | select(.model == "claude-3-5-sonnet-20240620")] | sort_by(.model_latency)' logs_out/combined_logs.json

Sort by latency.

Or, to make a better comparison, let's sort requests for small prompts.

jq '[.[] | select(.model == "claude-3-5-sonnet-20240620" and .prompt_token_count < 10)] | sort_by(.model_latency)' logs_out/combined_logs.json

Sort by latency for tiny prompts.

We see our slowest request at the bottom of the list.

{
  "transport_type": "http",
  "model": "claude-3-5-sonnet-20240620",
  "model_latency": 3.061051845550537,
  "prompt": null,
  "prompt_length": 23,
  "prompt_token_count": 8,
  "completion_length": 6,
  "completion_token_count": 1,
  "response": null,
  "tags": "{}",
  "error": "None",
  "request_start_time": "2024-10-08 15:54:35.842454+00:00",
  "request": "{\"max_tokens\":1,\"model\":\"claude-3-5-sonnet-20240620\",\"temperature\":0.47343091617109434}",
  "workspace_id": "default",
  "api_key_uuid": null
}

Example slow request.

Not terrible. 🙂