Optimizing Token Usage in OpenAI's JSON Mode with Stop Sequences
David Manouchehri -
In our last blog post, you may have noticed we used stop in our API request.
A good question to ask here, is why are we doing this? To understand, let's see the token usage when we don't use stop=["rue", "alse"].
Now, let's try adding back stop to our request and see what changes.
Nice, we reduced our total output token usage by 20%, and we can still tell what the correct answer is! This might not seem like a lot for a simple example, but once you start scaling up, it adds up quickly. For example, if your framework is using 1 billion output tokens with gpt-4o-2024-08-06 per month, your bill would go from $10,000 to $8,000!
As a bonus, we can actually recover the full last token, simply by enabling logprobs.
That said, even after OpenAI fixes this bug, we can script our way out of this by using response.choices[0].logprobs.content instead of response.choices[0].message.content.
Success! We recommend using the reconstruction method, as it should continue to work indefinitely.