The Agent API, powering the new Agent Mode, allows you to engage in
conversations leveraging multi-tool reasoning capabilities. It is now the recommended way to interact with Paradigm tools.
A component of the conversation turn’s answer, it can be of type reasoning, tool_call or text (final answer to the user).
The parts in an agent message within a turn are structured in the following sequence:
a reasoning part explaining the reasoning about wether the agent will choose tyo use a tool or return the final answer.
a tool_call part containing information about the tool called as well as the tool’s raw result.
repeat the 2 first steps until the agent choose to return the final answer or the reasoning budget is reached.
A set of parts corresponding to, within a turn, either the agent answer or the user query. A turn is thus primarily composed of a list of 2 messages: the user query and the agent answer.
import osimport jsonimport urllib.request# Get API key from environmentapi_key = os.getenv("PARADIGM_API_KEY")# Get base URL from environment (defaults to public instance)base_url = os.getenv("PARADIGM_BASE_URL", "https://paradigm.lighton.ai/api/v3")url = f"{base_url}/threads/turns"payload = { "chat_setting_id": 1, "ml_model": "alfred-sv5", "query": "What is the capital of France?"}data = json.dumps(payload).encode("utf-8")req = urllib.request.Request( url, data=data, headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, method="POST")with urllib.request.urlopen(req) as resp: response = json.load(resp)
While specifying:
chat_settings_id: the ID of the Chat Settings attached to your company (optional, defaults to the one attached to your company).
ml_model: the name of the ML model to use. (optional, default to your company’s default model).
query: the user query.
Response example
Copy
{ "id": "9339cf12-8530-421d-8dda-79cd3016a182", "object": "turn", "thread": "783b089b-0ecc-496b-b0c3-70d0f327d9b8", "status": "completed", "messages": [ { "id": "2951681d-9477-4e8d-889b-0f63bef890f6", "object": "message", "role": "user", "parts": [ { "type": "text", "text": "What is the capital of France?" } ], "created_at": "2025-12-16T16:01:53.806331Z" }, { "id": "5ec22b40-ba6c-40ab-b94d-97fc05d5c142", "object": "message", "role": "assistant", "parts": [ { "type": "reasoning", "reasoning": "Basic factual question - no tools needed." }, { "type": "text", "text": "La capitale de la France est Paris." } ], "created_at": "2025-12-16T16:01:56.210968Z" } ], "created_at": "2025-12-16T16:01:53.804779Z"}
In this case the agent answer on this turn is made of of two parts:
a reasoning part explaining the reasoning behind the answer.
a text part containing the final answer.
Note that the last part of the agent answer is always of type text and consitute the final answer
and that the agent answer is always the second and last message.
If the turn takes too long to generate you will receive an HTTP 202 response with the created
thread ID (tread) and the turn ID (turn_id) in the payload. You can then use the GET /api/v3/threads/:id/turns/:turn_id endpoint
to poll for the status until it is in state completed and then retrieve the final answer.
Alternatively you can skip the waiting and use Background Mode to generate the turn in the background.
the POST /api/v3/threads/:id/turns endpoint to create a turn in an existing conversation thread, thus benefiting from the context already there if needed.
In those examples we will use the method involving using a new thread, but both endpoints take the same payload and return the same response schema.
It will also be assumed that you parse the API response into a Python dictionary, like done in the Quickstart section.
For single call usage in workflows it is recommented to use the first endpoint, creating a fresh thread per turn.
You can scope a query within a list of Worspkaces and/or Files (documents) using the workspaces_ids and/or
file_ids parameters in the payload. For instance, using the POST /api/v3/threads/turns endpoint:
Copy
{ "chat_setting_id": 1, "ml_model": "alfred-sv5", "query": "What is the conclusion of the last quaterly meeting note?", "workspaces_ids": [1, 2]}
It is recommended to not force a tool when scoping, as the automatic routing will ensure the optimal tool for
your type of file(s) is used.
You can use the following endpoints to retrieve the list of available workspaces and files:
Call the POST /api/v3/threads/turns endpoint,
while specifying the tool name to use in the payload like this:
Copy
{ "chat_setting_id": 1, "ml_model": "alfred-sv5", "query": "What is the conclusion of the last quaterly meeting note?", "force_tool": "document_search"}
Forcing a tool when working with documents is not recommended, prefer scoping files/workspaces to your query
and let the automatic routing decides which tool is the best for your file(s).
Note that the native tools available are:
document_search
document_analysis
code_execution
web_search
Ensure the selected tool is enabled in the Agent Tools section of your Chat Settings.
As seen in the Quickstart section, the agent answer is the last message of the turn.You can then retrieve the final answer like this:
In this scenario the first part of the agent answer is a tool_call part. It contains information about
the tool called as well as the tool’s raw result.
Note that since the tool was forced to a specific value, this turn won’t contain a reasoning part.
You can extend the system prompt during one query and pass specific instructions using the system_prompt_suffix
parameter of the payload. For instance, using the POST /api/v3/threads/turns endpoint:
Copy
{ "query": "What is the conclusion of the last quaterly meeting note?", "force_tool": "document_search", "system_prompt_suffix": "Rephrase technical term into more accessible language."}
Using this method rather than adding more instructions in your query allow to tune the agent behaviour while ensuring
an optimal search accuracy for your query in the case of document_search or document_analysis tools.
You can request structured output from the agent by specifying the response_format parameter in the payload.
For instance, using the POST /api/v3/threads/turns endpoint:
Copy
{ "query": "What is the capital of France?", "response_format": { "type": "object", "properties": { "capital": { "type": "string" }, "country": { "type": "string" } }, "required": [ "capital", "country" ] }}
For more information about response_format, please consult the Guided JSON documentation.
Some tools like code_execution can generate artifacts that can be downloaded afterwards. For instance, when using
the POST /api/v3/threads/turns endpoint:
Copy
{ "query": "Draw me a graph of a sinusoidal function.", "force_tool": "code_execution"}
It will result into an agent answer consisting of 2 parts:
a tool_call part in which you can find the generated artifacts.
For heavy query that needs to be handled asynchronously, you can use the POST /api/v3/threads/turns endpoint
with the background parameter set to true. For instance:
Copy
{ "query": "What is the capital of France?", "background": true}
You will receive an HTTP 200 response with the thread object but containing only the user query, notice how the status field
is set to running:
Response example
Copy
{ "id": "6d0d54c3-a87b-4c66-8af2-8ef59418358e", "object": "turn", "thread": "12df8e86-e8b0-49ee-8634-f5ce8944591c", "status": "running", "messages": [ { "id": "22461762-afd8-4533-8cf2-2e7d516a38d6", "object": "message", "role": "user", "parts": [ { "type": "text", "text": "What is the capital of France?" } ], "created_at": "2025-12-17T14:46:00.703903Z" } ], "created_at": "2025-12-17T14:46:00.702207Z"}
You can then retrieve the thread id in the following way:
Copy
thread_id: str = response["thread"]
You can now poll periodically the GET /api/v3/threads/:id endpoint
until the status field is set to completed.When the thread is back in its completed status, you can fetch its turns using
the GET /api/v3/threads/:id/turns endpoint.
To only retrieve the last turn, you can set the limit query parameter to 1.