Asynchronous API Requests with AsyncOpenAI Client

Using the openai client with paradigm, you can leverage the power of asynchronous programming in Python to enhance the speed of your API requests.

Utilizing asyncio and the AsyncOpenAI client, you can send multiple requests concurrently, significantly reducing wait time and improving response time for bulk operations.

Efficiency and Scalability

Asynchronous requests allow for non-blocking operations, making your application more efficient and scalable, especially when dealing with multiple simultaneous API calls.

 

Prerequisites

  • Ensure you have the openai library installed with support for asynchronous operations.
  • An API key stored securely, preferably in an environment variable PARADIGM_API_KEY.

Asynchronous Programming with asyncio

asyncio is a Python library that provides a framework for writing concurrent code using the async and await syntax. It is particularly useful for IO-bound and high-level structured network code. You can find more information about Asyncio here.

Setting Up AsyncOpenAI Client

To utilize asynchronous features, use AsyncOpenAI instead of OpenAI. Initialize the client with your API key and the appropriate endpoint.

from openai import AsyncOpenAI
import os

# Asynchronously initialize the client
async_client = AsyncOpenAI(api_key=os.getenv("PARADIGM_API_KEY"), base_url="https://paradigm.lighton.ai/api/v2")

names = ["Alice", "Bob", "Charlie", "David", "Emma", "Frank", "Grace", "Hannah", "Ian", "Jessica", "Kevin", "Linda", "Michael", "Nancy", "Olivia", "Peter", "Quincy", "Rachel", "Samuel", "Tiffany"]


#as an example we will use this batch of messages
messages_batch = [[{"role": "user", "content": f"Say hello to the new user on Paradigm! Give a one sentence short highly personalized welcome to: {name}"}] for name in names]

Asynchronous API Calls

Here's how to implement asynchronous API calls:

  1. Define an Async Function for Sending Messages:
    Create an async function that sends a message to the API and awaits the response.
async def send_message(messages, *args, **kwargs):
response = await client.chat.completions.create(messages=messages, *args, **kwargs)
return response
  1. Sending Requests Concurrently:
    Use asyncio.gather to send multiple requests concurrently. This function waits for all futures (asynchronous operations) to complete.
async def main():
start_time = time.time()
tasks = [send_message(messages, model="alfred-40b-1123", temperature=0.4) for messages in messages_batch]
responses = await asyncio.gather(*tasks)
duration = time.time() - start_time
print(f"Async execution took {duration} seconds.")
for response in responses:
print(response.choices[0].message.content)
return responses
  1. Running the Asynchronous Main Function:
    Use asyncio.run() to execute the main function, which handles all asynchronous operations.
if __name__ == "__main__":
asyncio.run(main())

Comparison with Synchronous Execution

When comparing asynchronous execution to traditional synchronous (sequential) execution, asynchronous operations generally complete in significantly less time—up to 3 times faster in this example, with potential for even greater improvements depending on the lenght of the different requests. This is particularly true for I/O-bound tasks such as API requests. The efficiency gains from asynchronous execution stem from its non-blocking nature, which allows other tasks to proceed without waiting for I/O operations to finish.

Best Practices
  • Always use await with async functions to avoid runtime errors.
  • Use AsyncOpenAI for asynchronous operations to ensure non-blocking calls.
  • For Jupyter notebooks, run asynchronous code via a separate Python script using the !python file_to_execute.py magic command in a cell to avoid event loop issues.

By incorporating asynchronous requests into your application, you can achieve greater efficiency and scalability, particularly when dealing with large numbers of API calls.

Full example for comparision

To compare synchronous and asynchronous API calls in a practical scenario, you can use the following snippet. This snippet will create a Python file, test.py, implementing synchronous and asynchronous API requests, respectively. You can then run this script to observe the difference in execution time, demonstrating the efficiency of asynchronous programming for batched requests.

%%writefile speed_test.py
import os
import time
from openai import OpenAI, AsyncOpenAI
import asyncio

# Synchronous Client Setup
sync_client = OpenAI(api_key=os.getenv("PARADIGM_API_KEY"), base_url="https://paradigm.lighton.ai/api/v2")

# Asynchronous Client Setup
async_client = AsyncOpenAI(api_key=os.getenv("PARADIGM_API_KEY"), base_url="https://paradigm.lighton.ai/api/v2")

names = ["Alice", "Bob", "Charlie", "David", "Emma", "Frank", "Grace", "Hannah", "Ian", "Jessica", "Kevin", "Linda", "Michael", "Nancy", "Olivia", "Peter", "Quincy", "Rachel", "Samuel", "Tiffany"]

# Synchronous function to send messages
def sync_send_message(name):
messages = [{"role": "user", "content": f"Say hello to the new user on Paradigm! Give a one sentence short highly personalized welcome to: {name}"}]
response = sync_client.chat.completions.create(messages=messages, model="alfred-40b-1123", temperature=0.4)
return response

# Asynchronous function to send messages
async def async_send_message(messages, *args, **kwargs):
response = await async_client.chat.completions.create(messages=messages, *args, **kwargs)
return response

def sync_main():
responses = []
start_time = time.time()
for name in names:
response = sync_send_message(name)
responses.append(response)
duration = time.time() - start_time
print(f"Synchronous execution took {duration} seconds.")

async def async_main():
start_time = time.time()
tasks = [async_send_message([{"role": "user", "content": f"Say hello to the new user on Paradigm! Give a one sentence short highly personalized welcome to: {name}"}], model="alfred-40b-1123", temperature=0.4) for name in names]
responses = await asyncio.gather(*tasks)
duration = time.time() - start_time
print(f"Asynchronous execution took {duration} seconds.")

if __name__ == "__main__":
async_start_time = time.time()
asyncio.run(async_main())
async_times=time.time() - async_start_time

sync_start_time = time.time()
sync_main()
sync_times=time.time() - sync_start_time

improvement_factor = sync_times / async_times

print(f"Improvement factor: {improvement_factor}")
 

To execute the comparison:

  1. Run the above code snippet in a Jupyter notebook cell to create speed_test.py.

  2. Execute the script within the Jupyter notebook or a terminal using the command !python speed_test.py.

This script will first run the assynchronous version, outputting the total execution time. It will then run the synchronous version, doing the same. Comparing the two execution times will illustrate the efficiency gains achievable with asynchronous API calls.

In our case we got the following output:

Asynchronous execution took 7.048963308334351 seconds.
Synchronous execution took 19.48901128768921 seconds.
Improvement factor: 2.7641254222771914

Conclusion

Leveraging asynchronous API requests via the AsyncOpenAI client can drastically improve application performance and scalability. As demonstrated, asynchronous execution can be nearly 3 times faster than synchronous methods, offering significant efficiency gains. This approach is essential for handling high-volume API interactions, ensuring applications are efficient.