Peeking Behind the Curtain: How to Gauge GPT's Confidence

Discover how to gauge GPT's confidence using log probabilities. This post walks through a Python script that extracts and analyzes probabilities from GPT-4o's responses, helping you understand the certainty behind AI-generated answers.

David Manouchehri

17 Oct 2024 • 7 min read

So, why should you care about GPT's confidence? Think about it - wouldn't it be great to know if GPT is pretty darn sure about its answer, or if it's just taking a wild guess? This kind of insight can be extremely helpful, especially if you're using AI for important tasks like fact-checking or making decisions.

In this blog post, we'll explore how to peek behind the curtain and gauge OpenAI GPT-4o's confidence using log probabilities, with a step-by-step walkthrough of a Python script.

Introduction

Language models generate text by predicting the next word (or token) in a sequence. During this process, they assign probabilities to possible next tokens. By monitoring these probabilities, we can gauge the model's confidence in its choices.

In the context of fact-checking or binary decisions (e.g., "true" or "false"), extracting these probabilities can help us understand how confident the model is in its answer. This can be particularly useful when you need to make decisions based on the model's output or when you want to present users with not just an answer but also the confidence level behind it.

Understanding Log Probabilities

Log probabilities are the logarithm of probabilities. Models often work with log probabilities because they are more numerically stable and easier to compute, especially when dealing with very small probabilities.

Probability (p): A value between 0 and 1.
Log Probability (log(p)): The natural logarithm of the probability.

Converting log probabilities back to probabilities involves exponentiating the log probability.

By analyzing log probabilities, we can compare the confidence levels between different tokens or responses.

The Script Overview

The script we'll discuss performs the following actions:

Sends OpenAI API requests to gpt-4o-2024-08-06 for fact-checking statements.
Extracts log probabilities to determine the model's confidence in labeling statements as "true" or "false."
Outputs the confidence percentages.

Key Components Explained

Let's walk through the crucial parts of the script.

Importing Modules and Configuration

import asyncio
import logging
import httpx
from openai import AsyncOpenAI
import random
import os
import tiktoken
import numpy as np

# Configure the logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
c_handler = logging.StreamHandler()
logger.addHandler(c_handler)

# Retrieve OpenAI API credentials
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_API_BASE = os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"

# Initialize the OpenAI client
client = AsyncOpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_API_BASE,
    http_client=httpx.AsyncClient(http2=True),
    max_retries=0,
    timeout=3600,
)

Logging: Sets up logging to track the script's progress.
API Client: Initializes an asynchronous OpenAI API client with HTTP/2 support for better performance.

Token Biasing

# Specify the GPT model to use
model = "gpt-4o-2024-08-06"

# Define the expected JSON responses
preferred_answers = ['{"fact": true}', '{"fact": false}']

# Get the token encoding for the model
encoding = tiktoken.encoding_for_model(model)

# Encode preferred answers to tokens
preferred_tokens = encoding.encode_batch(preferred_answers)

# Extract unique tokens
unique_tokens = set(token for sublist in preferred_tokens for token in sublist)

# Set a positive bias value
logit_bias_value = 50

# Create a token bias dictionary
token_dict = {token: logit_bias_value for token in unique_tokens}

Preferred Answers: Specifies the two target responses we expect from the model.
Token Encoding: Uses tiktoken to encode the preferred answers into tokens.
Logit Bias: Applies a positive bias to the tokens corresponding to the preferred answers to make them more likely.

Making the API Request

# List of statements to be fact-checked
statements_to_check = [
    "SEZC means Special Economic Zone Company.",
    "SEZC means Special Economic Zone Corporation.",
]

# Process each statement
for statement_to_check in statements_to_check:
    logger.debug(f"--------")
    logger.debug(f"Checking statement: {statement_to_check}")

    # Request a completion from the OpenAI API
    response = await client.chat.completions.create(
        response_format={"type": "json_object"},
        model=model,
        max_tokens=20,
        logprobs=True,
        top_logprobs=20,
        logit_bias=token_dict,
        stop=["rue", "alse", "}"],
        seed=random.randint(1, 0x7FFFFFFF),
        messages=[
            {
                "role": "system",
                "content": 'Fact check the user\'s message and answer in a JSON object that follows {"fact": boolean}.',
            },
            {"role": "user", "content": statement_to_check},
        ],
        temperature=0.0,
        n=1,
    )

Statements: A list of statements we want to fact-check.
API Request: Sends a request to the GPT model with specific parameters:
- logprobs=True: Requests log probabilities.
- top_logprobs=20: Retrieves the top 20 token probabilities.
- logit_bias: Applies the token bias.

Processing the Response

results = []

# Analyze each choice in the API response
for idx, choice in enumerate(response.choices):
    odds = {"true": None, "false": None}

    # Skip choices without log probabilities
    if not choice.logprobs or not choice.logprobs.content:
        continue

    # Extract log probabilities for "true" and "false" tokens
    for content_logprob in choice.logprobs.content:
        token = content_logprob.token.lower()
        logprob = content_logprob.logprob

        for label in ["true", "false"]:
            if label in token and odds[label] is None:
                odds[label] = logprob

                # Find the probability for the opposite label
                other_label = "false" if label == "true" else "true"
                if odds[other_label] is None:
                    sorted_top_logprobs = sorted(
                        content_logprob.top_logprobs,
                        key=lambda x: x.logprob,
                        reverse=True,
                    )
                    for top_logprob in sorted_top_logprobs:
                        if other_label in top_logprob.token.lower():
                            odds[other_label] = top_logprob.logprob
                            break

    # Ensure both probabilities were found
    if odds["true"] is None or odds["false"] is None:
        continue

    # Convert log probabilities to percentages
    odds_of_true_as_percentage = np.round(np.exp(odds["true"]) * 100, 2)
    odds_of_false_as_percentage = np.round(np.exp(odds["false"]) * 100, 2)

    # Compile results
    results.append(
        {
            "choice_index": idx,
            "odds_of_true": odds["true"],
            "odds_of_false": odds["false"],
            "odds_of_true_as_percentage": odds_of_true_as_percentage,
            "odds_of_false_as_percentage": odds_of_false_as_percentage,
        }
    )

# Log the results
for result in results:
    logger.debug(f"Choice {result['choice_index']}:")
    logger.debug(f"  odds_of_true: {result['odds_of_true']}")
    logger.debug(f"  odds_of_false: {result['odds_of_false']}")
    logger.debug(
        f"  odds_of_true_as_percentage: {result['odds_of_true_as_percentage']}%"
    )
    logger.debug(
        f"  odds_of_false_as_percentage: {result['odds_of_false_as_percentage']}%"
    )

Extracting Log Probabilities: Iterates over the response to find the top log probabilities associated with the tokens "true" and "false."
Calculating Percentages: Converts the log probabilities to percentages to represent confidence levels.
Output: Logs the confidence percentages for each choice.

Running the Script

Set API Credentials: Export your OpenAI API key as an environment variable:

export OPENAI_API_KEY='your-api-key'

Set Up Environment: Ensure you have Python 3.11 and install the required packages:

pip3.11 install httpx openai tiktoken numpy

To run the script:

python3.11 your_script_name.py

Sample Output:

--------
Checking statement: SEZC means Special Economic Zone Company.
Choice 0:
  odds_of_true: -0.05496985
  odds_of_false: -2.9299698
  odds_of_true_as_percentage: 94.65%
  odds_of_false_as_percentage: 5.34%
--------
Checking statement: SEZC means Special Economic Zone Corporation.
Choice 0:
  odds_of_true: -1.8066396
  odds_of_false: -0.18163955
  odds_of_true_as_percentage: 16.42%
  odds_of_false_as_percentage: 83.39%

Perfect! gpt-4o-2024-08-06 is correctly fact checking both of our statements. Now... what if we try to use the smaller gpt-4o-mini-2024-07-18 model?

--------
Checking statement: SEZC means Special Economic Zone Company.
Choice 0:
  odds_of_true: -11.62527
  odds_of_false: -0.0002702761
  odds_of_true_as_percentage: 0.0%
  odds_of_false_as_percentage: 99.97%
--------
Checking statement: SEZC means Special Economic Zone Corporation.
Choice 0:
  odds_of_true: -10.750117
  odds_of_false: -0.00011760922
  odds_of_true_as_percentage: 0.0%
  odds_of_false_as_percentage: 99.99%

This shows that the smaller model is confidently wrong, while the larger model is reasonably confident in providing the correct answers.

Conclusion

By accessing the log probabilities from GPT's response, we can quantify the model's confidence in its outputs. This approach is particularly useful for applications like fact-checking, where understanding the certainty behind an answer is crucial.

Full Script Code

Below is the complete script for reference.

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

# Import necessary modules
import asyncio  # For asynchronous programming and coroutine management
import logging  # For structured logging and debugging
import httpx  # For making asynchronous HTTP requests with HTTP/2 support
from openai import AsyncOpenAI  # Asynchronous client for OpenAI API
import random  # For generating random seeds
import os  # For accessing environment variables
import tiktoken  # For tokenizing text according to OpenAI's encoding schemes
import numpy as np  # For numerical computations and array operations

# Configure the logger
logger = logging.getLogger(__name__)  # Create a logger instance for this module
logger.setLevel(logging.DEBUG)  # Set logging level to capture all messages

# Set up console output for logs
c_handler = logging.StreamHandler()  # Create a handler for console output
logger.addHandler(c_handler)  # Attach the handler to the logger

# Retrieve OpenAI API credentials from environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")  # API key for authentication
OPENAI_API_BASE = (
    os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"
)  # API base URL, defaulting to OpenAI's if not specified

# Initialize the OpenAI client with custom HTTP settings
client = AsyncOpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_API_BASE,
    http_client=httpx.AsyncClient(
        http2=True,  # Enable HTTP/2 for improved performance
        limits=httpx.Limits(
            max_connections=None,  # Allow unlimited concurrent connections
            max_keepalive_connections=None,  # Keep all connections alive
            keepalive_expiry=None,  # Connections never expire
        ),
    ),
    max_retries=0,  # Disable automatic retries
    timeout=3600,  # Set a generous timeout of 1 hour
)


async def main():
    # Specify the GPT model to use
    model = "gpt-4o-2024-08-06"

    # Define the expected JSON responses
    preferred_answers = ['{"fact": true}', '{"fact": false}']

    # Get the token encoding specific to the chosen model
    encoding = tiktoken.encoding_for_model(model)

    # Convert preferred answers to token representations
    preferred_tokens = encoding.encode_batch(preferred_answers)

    # Extract unique tokens from all preferred answers
    unique_tokens = set(token for sublist in preferred_tokens for token in sublist)

    # Set a positive bias value to increase likelihood of preferred tokens
    logit_bias_value = 50

    # Create a dictionary mapping each unique token to the bias value
    token_dict = {token: logit_bias_value for token in unique_tokens}

    # List of statements to be fact-checked
    statements_to_check = [
        "SEZC means Special Economic Zone Company.",
        "SEZC means Special Economic Zone Corporation.",
    ]

    # Process each statement
    for statement_to_check in statements_to_check:
        logger.debug(f"--------")  # Visual separator in logs
        logger.debug(f"Checking statement: {statement_to_check}")

        # Request a completion from the OpenAI API
        response = await client.chat.completions.create(
            response_format={"type": "json_object"},  # Enforce JSON response
            model=model,
            max_tokens=20,  # Limit response length
            logprobs=True,  # Request token probabilities
            top_logprobs=20,  # Get probabilities for top 20 alternative tokens
            logit_bias=token_dict,  # Apply token biases
            stop=["rue", "alse", "}"],  # Early stopping conditions
            seed=random.randint(1, 0x7FFFFFFF),  # Random seed for reproducibility
            messages=[
                {
                    "role": "system",
                    "content": 'Fact check the user\'s message and answer in a JSON object that follows {"fact": boolean}.',
                },
                {"role": "user", "content": statement_to_check},
            ],
            temperature=0.0,  # Use deterministic sampling
            n=1,  # Generate a single completion
        )

        results = []

        # Analyze each choice in the API response
        for idx, choice in enumerate(response.choices):
            odds = {"true": None, "false": None}

            # Skip choices without log probabilities
            if not choice.logprobs or not choice.logprobs.content:
                continue

            # Extract log probabilities for "true" and "false" tokens
            for content_logprob in choice.logprobs.content:
                token = content_logprob.token.lower()
                logprob = content_logprob.logprob

                for label in ["true", "false"]:
                    if label in token and odds[label] is None:
                        odds[label] = logprob

                        # Find the probability for the opposite label
                        other_label = "false" if label == "true" else "true"
                        if odds[other_label] is None:
                            sorted_top_logprobs = sorted(
                                content_logprob.top_logprobs,
                                key=lambda x: x.logprob,
                                reverse=True,
                            )
                            for top_logprob in sorted_top_logprobs:
                                if other_label in top_logprob.token.lower():
                                    odds[other_label] = top_logprob.logprob
                                    break

            # Ensure both probabilities were found
            if odds["true"] is None:
                raise ValueError("odds_of_true is None")
            if odds["false"] is None:
                raise ValueError("odds_of_false is None")

            # Convert log probabilities to percentages
            odds_of_true_as_percentage = np.round(np.exp(odds["true"]) * 100, 2)
            odds_of_false_as_percentage = np.round(np.exp(odds["false"]) * 100, 2)

            # Compile results for this choice
            results.append(
                {
                    "choice_index": idx,
                    "odds_of_true": odds["true"],
                    "odds_of_false": odds["false"],
                    "odds_of_true_as_percentage": odds_of_true_as_percentage,
                    "odds_of_false_as_percentage": odds_of_false_as_percentage,
                }
            )

        # Log the results for each choice
        for result in results:
            logger.debug(f"Choice {result['choice_index']}:")
            logger.debug(f"  odds_of_true: {result['odds_of_true']}")
            logger.debug(f"  odds_of_false: {result['odds_of_false']}")
            logger.debug(
                f"  odds_of_true_as_percentage: {result['odds_of_true_as_percentage']}%"
            )
            logger.debug(
                f"  odds_of_false_as_percentage: {result['odds_of_false_as_percentage']}%"
            )


# Script entry point
if __name__ == "__main__":
    asyncio.run(main())  # Execute the main coroutine