Batch Processing with the New Batch API

The new Batch API from OpenAI allows you to create asynchronous batch jobs at a lower cost and with higher rate limits. These batches are typically completed within 24 hours, although they may finish sooner depending on global usage.

This API is perfect for tasks such as tagging and enriching content, categorizing support tickets, performing sentiment analysis on large datasets, and much more.

Ideal Use Cases for the Batch API

The Batch API is particularly suited for processing large volumes of data asynchronously. Whether you're categorizing content on a blog, performing sentiment analysis on customer feedback, or generating summaries and translations for large document collections, this API can streamline your workflow and reduce costs.

First Example: Categorizing Movies

In this example, we demonstrate how to use the Batch API to categorize movies using the GPT-4o-mini model. The goal is to extract movie categories and generate a one-sentence summary from movie descriptions.

We use a JSON format to output the categories and summary, and the IMDB top 1000 movies dataset serves as our data source.

Second Example: Captioning Images

In the second example, we use the Batch API to generate captions for images of furniture items using GPT-4-turbo's vision capabilities. The captions are short and descriptive, highlighting the most important features of the items depicted.

By leveraging the Batch API, we can efficiently process a large dataset of images, providing structured outputs in a timely manner.

Setting Up and Running Batch Jobs

To get started with the Batch API, you'll need the latest version of the OpenAI SDK. The process involves creating a JSONL file with your batch tasks and uploading it through the API. Once the job is complete, you can retrieve and process the results.

Below are the basic steps:

  1. Install the SDK

    Ensure you have the latest version of the OpenAI SDK by running:

    %pip install openai --upgrade
  2. Prepare Your Tasks

    Create a list of tasks you want to process, formatted as JSON objects. Each task should include details like the model to use, the method, and the data.

  3. Create and Upload the Batch File

    Save your tasks in a JSONL file and upload it to the API to start the batch job.

Conclusion

The Batch API is a powerful tool for handling large-scale tasks that can be processed asynchronously. It offers flexibility, cost savings, and the ability to use the same parameters as the Chat Completions endpoint, making it an essential resource for developers working with large datasets.

Ready to Supercharge Your AI?

Join easyfinetune today and unlock the power of curated, custom instruct datasets for GPT, Llama, and more. Be part of the newest data curation service for LLMs.