ai.moda's blog

ai.moda's blog

Sign in Subscribe

OpenAI Internals: Structured Outputs to System Prompts

OpenAI Internals: Structured Outputs to System Prompts

Curious how OpenAI converts your schema into a hidden system message? We'll show you how it works!

Optimizing Claude MCP Server Usage: Leveraging Prompt Caching

Optimizing Claude MCP Server Usage: Leveraging Prompt Caching

Multiple MCP server calls are billed like separate API requests, quickly inflating your token count. Discover how prompt caching - an undocumented feature for MCP servers - dramatically reduces these costs with minimal code changes.

Sora Pricing on Azure OpenAI

Sora Pricing on Azure OpenAI

It's not great.

OpenAI Internals: Remote MCP to System Prompts

OpenAI Internals: Remote MCP to System Prompts

Ever wonder how OpenAI handle remote MCP servers? Wonder no more!

Claude Code Internals: Intercepting Requests

Claude Code Internals: Intercepting Requests

Want to see what Claude Code is doing? We'll show you how to check!

Claude Code Internals: Web Search

Claude Code Internals: Web Search

Ever wonder how Claude Code searches the web? Wonder no more!

Google Vertex AI Model Availability (Updated Hourly)

Google Vertex AI Model Availability (Updated Hourly)

Live list of all models on Google Cloud Vertex AI.

Amazon Bedrock Model Availability (Updated Hourly)

Amazon Bedrock Model Availability (Updated Hourly)

Live list of all models on Amazon Bedrock.

Where are my Cloudflare Email Workers running?

Where are my Cloudflare Email Workers running?

A simple solution to figure out what PoP your Cloudflare Email Workers are running in.

Automating Aurora DSQL and Cloudflare Hyperdrive Integration with Workers

Automating Aurora DSQL and Cloudflare Hyperdrive Integration with Workers

Learn how to automatically manage rotating credentials and IAM tokens for your PostgreSQL databases using Cloudflare Workers. Perfect for teams using Aurora DSQL or any database that requires periodic credential updates with Hyperdrive.

Automatic Anthropic to Vertex AI Failover using Cloudflare AI Gateway

Automatic Anthropic to Vertex AI Failover using Cloudflare AI Gateway

Getting the dreaded overloaded_error when using claude-3-5-sonnet-20241022? Use Cloudflare AI Gateway to automatically failover to Vertex AI!

Identifying Anthropic Claude Errors on Amazon Bedrock

Identifying Anthropic Claude Errors on Amazon Bedrock

Learn how to identify and troubleshoot errors with Claude 3.5 Sonnet on Amazon Bedrock using CloudWatch and CloudTrail.

Chaining OpenAI Models: Better and Faster

Chaining OpenAI Models: Better and Faster

This blog explores combining AI models for a efficient, accurate, and cost-effective solution. We chain responses from a larger model (GPT-4o) to a smaller one (GPT-4o mini) to convert complex outputs into structured JSON. A Python script demonstrates this step-by-step process.

Optimizing Token Usage in OpenAI's JSON Mode with Stop Sequences

Optimizing Token Usage in OpenAI's JSON Mode with Stop Sequences

By strategically setting stop conditions, you can cut down on unnecessary tokens, saving 20% on your output in our example. Learn how to implement this technique and even recover full tokens with logprobs.

Peeking Behind the Curtain: How to Gauge GPT's Confidence

Peeking Behind the Curtain: How to Gauge GPT's Confidence

Discover how to gauge GPT's confidence using log probabilities. This post walks through a Python script that extracts and analyzes probabilities from GPT-4o's responses, helping you understand the certainty behind AI-generated answers.

Collecting Anthropic Server Logs with an Undocumented API

Collecting Anthropic Server Logs with an Undocumented API

Learn how to collect and analyze logs for Claude using an undocumented Anthropic API. This guide covers log collection, deduplication, and sorting by error or latency, helping you measure performance and track errors.

Using Regional Google Storage Buckets with S3 APIs

Using Regional Google Storage Buckets with S3 APIs

Learn how to use regional GCS buckets with S3 API compatibility for data sovereignty and compliance. This guide covers creating a bucket, setting permissions, and using HMAC for S3 API access. Test your setup with rclone and create presigned URLs for secure file sharing.

Restricting Amazon Bedrock Regions with IAM

Restricting Amazon Bedrock Regions with IAM

Learn how to enhance data sovereignty by using AWS IAM to restrict Amazon Bedrock requests to specific regions. This guide provides a sample policy and crucial tips for managing cross-region inference while maintaining control over your AI processing locations.

Identifying Anthropic Claude Errors on Vertex AI

Identifying Anthropic Claude Errors on Vertex AI

Discover how to monitor Claude 3.5 Sonnet's performance on Google Vertex AI using Audit Logs and Log Explorer. Learn to identify and analyze errors, distinguish between user and system issues, and calculate accurate error rates to estimate your AI application's reliability.

Leveraging Anthropic's New Message Batches API and Prompt Caching: A Practical Guide

Leveraging Anthropic's New Message Batches API and Prompt Caching: A Practical Guide

Anthropic's API now offers Message Batches and Prompt Caching. Let's explore how to use these powerful new features with a practical example.

Enforcing Azure AD / Entra ID on Azure OpenAI

Enforcing Azure AD / Entra ID on Azure OpenAI

A technical guide on enforcing Azure AD / Entra ID authentication for Azure OpenAI, including scripts to disable local auth and verify the changes.

Sending emails via SES with Cloudflare Workers

Sending emails via SES with Cloudflare Workers

Learn how to send emails from Cloudflare Workers via Amazon SES. Covers AWS setup, IAM config, Worker code, and security best practices.

Google Gemini Internals: Function Calling / Tools to Prompts

Google Gemini Internals: Function Calling / Tools to Prompts

Ever wonder how Google Gemini handles tool / function calling? Wonder no more!

OpenAI Internals: Function Calling / Tools to System Prompts

OpenAI Internals: Function Calling / Tools to System Prompts

Ever wonder how OpenAI handles tool / function calling? Wonder no more!

Analysis of CVE-2023-32439

Analysis of CVE-2023-32439

On June 21, 2023, Apple rolled out a security update for Safari, tagging CVE-2023-32439 a DFG type confusion issue.