OpenAI has introduced a new feature called Predicted Outputs, which is a unique ability in the API. This feature allows users to send a “prediction” of the expected output to speed up the results with GPT-4o or GPT-4o mini. Contrary to initial confusion, providing a prediction does not result in a price reduction, but can accelerate the process and may incur extra charges for divergent tokens. By using speculative decoding during inference, OpenAI can validate large batches in parallel instead of token-by-token sampling. The cost difference depends on the accuracy of the prediction, with 100% accuracy resulting in no additional charges.
https://simonwillison.net/2024/Nov/4/predicted-outputs/