The Chat API lets your apps have smart, context-aware conversations. It's great for back-and-forth chats or quick questions and adapts to the dialogue's flow.
- Multi-Turn Dialogues: Keeps track of extended chats.
- Single-Turn Responses: Gives straight answers to direct questions.
- Streaming: Offers real-time response delivery for dynamic interaction.
Getting Started
Setup:
First, set up your client with your API key and our endpoint URL. This connects your requests to our service.
from openai import OpenAI as OpenAICompatibleClient
import os
# Get API key from environment
api_key = os.getenv("PARADIGM_API_KEY")
# Our API base URL
base_url = "https://paradigm.lighton.ai/api/v2"
# Configure the OpenAI client
client = OpenAICompatibleClient(api_key=api_key, base_url=base_url)
Crafting Messages:
The core of your request is the messages
array, where each item has a role
("system"
,"user"
or "assistant"
) and the and the message content
.
-
System: Sets the conversation's context or provides instructions to the AI, guiding its responses. Ideal for initializing the chat with necessary background or rules.
-
User: Represents inputs from the human user engaging with the AI. This role is used for the questions, statements, or commands that users input into the chat.
-
Assistant: Denotes responses from the AI designed to assist, answer, or interact with the user. This role is for the AI-generated content in reply to the user or system prompts.
messages = [
{"role": "system", "content": "You are a helpfull AI assistant answering to user's question"},
{"role": "user", "content": "Hello, my name is Alfred, how are you?"},
{"role": "assistant", "content": "I'm here to help. What can I do for you?"},
{"role": "user", "content": "Can you remind me of my name?"}
]
Some models, like mistral
and mixtral
and others, may not support system
role prompts, potentially impacting AI's context understanding, response guidance and generating errors. Always check your model's compatibility with system prompts to prevent issues in conversation dynamics.
Making Requests:
Send your messages to get a response. Adjust settings like temperature
for creativity and max_tokens
for response length.
response = client.chat.completions.create(
model="alfred-40b-1123",
messages=messages,
temperature=0.7,
max_tokens=150,
stream=False
)
Reading Responses:
The API's reply is in the choices
part of the response. Here's how to get it:
assistant_reply = response.choices[0].message.content
print(assistant_reply)
Streaming example
Streaming allows for immediate, incremental delivery of responses, perfect for live interactions. With stream=True, the API sends parts of the response as they're generated. The example below prints each part upon arrival.
response_with_streaming = client.chat.completions.create(
model="alfred-40b-1123",
messages=messages,
temperature=0.7,
max_tokens=150,
stream=True
)
for chunk in response_with_streaming:
print(chunk.choices[0].delta.content)
Best Practices
- Keep Context Relevant: Only include messages that help the conversation.
- Use Streaming for Live Chats: Set
stream=True
for ongoing interactions. - Match Model to your infra: The models that you choose should reflect the ones deployed on your infrastructure.
- Balance Temperature and Tokens: Adjust for creativity vs. precision and response length.
- Model Variability: Responses can vary significantly across models for identical prompts.
- Temperature Sensitivity: Elevated settings may introduce unpredictability in responses.
- Response Length: Extended outputs risk being truncated if they exceed the max token limit.
- Streaming Complexity: While beneficial for real-time interactions, streaming can add complexity to managing responses.
Conclusion
This guide is your starting point for integrating chat functionalities. With the right settings and understanding, you can craft engaging AI conversations. Happy coding!