Agent API (beta)

The Agent API is available in beta from Unique Urchin release (december 2025) and later, endpoints may be subject to changes in the future.

Motivation

The Agent API, powering the new Agent Mode, allows you to engage in conversations leveraging multi-tool reasoning capabilities. It is now the recommended way to interact with Paradigm tools.

Terminology

Turn

A single interaction between a user and the model, comprising a user query, multi-steps reasoning, tool calls and the model’s final answer.

Thread

A conversation comprising a sequence of turns.

Parts

A component of the conversation turn’s answer, it can be of type reasoning, tool_call or text (final answer to the user). The parts in an agent message within a turn are structured in the following sequence:

a reasoning part explaining the reasoning about wether the agent will choose tyo use a tool or return the final answer.
a tool_call part containing information about the tool called as well as the tool’s raw result.
repeat the 2 first steps until the agent choose to return the final answer or the reasoning budget is reached.
a text part containing the final answer.

Message

A set of parts corresponding to, within a turn, either the agent answer or the user query. A turn is thus primarily composed of a list of 2 messages: the user query and the agent answer.

Source

A source used by a tool to generate the turn’s final answer, it can either of type web or document.

Artifact

A file generated by a tool, attached to a turn.

Quickstart

Ensure the used Chat Settings has “Agent mode” setting enabled and that the desired Agent tools are enabled.

You can initialize a new conversation thread using the POST /api/v3/threads/turns endpoint.

import os
import json
import urllib.request

# Get API key from environment
api_key = os.getenv("PARADIGM_API_KEY")
# Get base URL from environment (defaults to public instance)
base_url = os.getenv("PARADIGM_BASE_URL", "https://paradigm.lighton.ai/api/v3")

url = f"{base_url}/threads/turns"
payload = {
    "chat_setting_id": 1,
    "ml_model": "alfred-sv5",
    "query": "What is the capital of France?"
}
data = json.dumps(payload).encode("utf-8")

req = urllib.request.Request(
    url,
    data=data,
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    method="POST"
)

with urllib.request.urlopen(req) as resp:
    response = json.load(resp)

While specifying:

chat_settings_id: the ID of the Chat Settings attached to your company (optional, defaults to the one attached to your company).
ml_model: the name of the ML model to use. (optional, default to your company’s default model).
query: the user query.

Response example

{
  "id": "9339cf12-8530-421d-8dda-79cd3016a182",
  "object": "turn",
  "thread": "783b089b-0ecc-496b-b0c3-70d0f327d9b8",
  "status": "completed",
  "messages": [
    {
      "id": "2951681d-9477-4e8d-889b-0f63bef890f6",
      "object": "message",
      "role": "user",
      "parts": [
        {
          "type": "text",
          "text": "What is the capital of France?"
        }
      ],
      "created_at": "2025-12-16T16:01:53.806331Z"
    },
    {
      "id": "5ec22b40-ba6c-40ab-b94d-97fc05d5c142",
      "object": "message",
      "role": "assistant",
      "parts": [
        {
          "type": "reasoning",
          "reasoning": "Basic factual question - no tools needed."
        },
        {
          "type": "text",
          "text": "La capitale de la France est Paris."
        }
      ],
      "created_at": "2025-12-16T16:01:56.210968Z"
    }
  ],
  "created_at": "2025-12-16T16:01:53.804779Z"
}

In this case the agent answer on this turn is made of of two parts:

a reasoning part explaining the reasoning behind the answer.
a text part containing the final answer.

Note that the last part of the agent answer is always of type text and consitute the final answer and that the agent answer is always the second and last message.

If the turn takes too long to generate you will receive an HTTP 202 response with the created thread ID (tread) and the turn ID (turn_id) in the payload. You can then use the GET /api/v3/threads/:id/turns/:turn_id endpoint to poll for the status until it is in state completed and then retrieve the final answer.

Alternatively you can skip the waiting and use Background Mode to generate the turn in the background.

You can access the agent final answer like this:

# Minimal structure example to illustrate access pattern
response = {
    "messages": [
        {"parts": []},
        {"parts": [{"type": "text", "text": "Paris"}]}
    ]
}
answer: str = response["messages"][-1]["parts"][-1]["text"]

You can also retrieve the thread_id like this:

response = {"thread": "783b089b-0ecc-496b-b0c3-70d0f327d9b8"}
thread_id: str = response["thread"]

And then you can follow-up on the same conversations using the POST /api/v3/threads/:id/turns endpoint to create a new conversation turn.

Recipes

For the following recipes, you can either use:

the POST /api/v3/threads/turns endpoint to create a turn a new conversation thread.
the POST /api/v3/threads/:id/turns endpoint to create a turn in an existing conversation thread, thus benefiting from the context already there if needed.

In those examples we will use the method involving using a new thread, but both endpoints take the same payload and return the same response schema. It will also be assumed that you parse the API response into a Python dictionary, like done in the Quickstart section.

For single call usage in workflows it is recommented to use the first endpoint, creating a fresh thread per turn.

Scoping a workspace/document

You can scope a query within a list of Worspkaces and/or Files (documents) using the workspaces_ids and/or file_ids parameters in the payload. For instance, using the POST /api/v3/threads/turns endpoint:

{
  "chat_setting_id": 1,
  "ml_model": "alfred-sv5",
  "query": "What is the conclusion of the last quaterly meeting note?",
  "workspaces_ids": [1, 2]
}

It is recommended to not force a tool when scoping, as the automatic routing will ensure the optimal tool for your type of file(s) is used.

You can use the following endpoints to retrieve the list of available workspaces and files:

Using a specific tool

Call the POST /api/v3/threads/turns endpoint, while specifying the tool name to use in the payload like this:

{
  "chat_setting_id": 1,
  "ml_model": "alfred-sv5",
  "query": "What is the conclusion of the last quaterly meeting note?",
  "force_tool": "document_search"
}

Forcing a tool when working with documents is not recommended, prefer scoping files/workspaces to your query and let the automatic routing decides which tool is the best for your file(s).

Note that the native tools available are:

document_search
document_analysis
code_execution
web_search

Ensure the selected tool is enabled in the Agent Tools section of your Chat Settings.

As seen in the Quickstart section, the agent answer is the last message of the turn. You can then retrieve the final answer like this:

# Minimal structure example to illustrate access pattern
response = {
    "messages": [
        {"parts": []},
        {"parts": [{"type": "text", "text": "Paris"}]}
    ]
}
answer: str = response["messages"][-1]["parts"][-1]["text"]

In this scenario the first part of the agent answer is a tool_call part. It contains information about the tool called as well as the tool’s raw result.

Note that since the tool was forced to a specific value, this turn won’t contain a reasoning part.

You can retrieve it like this:

tool_call: dict = response["messages"][-1]["parts"][0]

Extending the system prompt

You can extend the system prompt during one query and pass specific instructions using the system_prompt_suffix parameter of the payload. For instance, using the POST /api/v3/threads/turns endpoint:

{
  "query": "What is the conclusion of the last quaterly meeting note?",
  "force_tool": "document_search",
  "system_prompt_suffix": "Rephrase technical term into more accessible language."
}

Using this method rather than adding more instructions in your query allow to tune the agent behaviour while ensuring an optimal search accuracy for your query in the case of document_search or document_analysis tools.

Requesting structured output

You can request structured output from the agent by specifying the response_format parameter in the payload. For instance, using the POST /api/v3/threads/turns endpoint:

{
  "query": "What is the capital of France?",
  "response_format": {
    "type": "object",
    "properties": {
      "capital": {
        "type": "string"
      },
      "country": {
        "type": "string"
      }
    },
    "required": [
      "capital",
      "country"
    ]
  }
}

For more information about response_format, please consult the Guided JSON documentation.

Generating artifacts

Some tools like code_execution can generate artifacts that can be downloaded afterwards. For instance, when using the POST /api/v3/threads/turns endpoint:

{
  "query": "Draw me a graph of a sinusoidal function.",
  "force_tool": "code_execution"
}

It will result into an agent answer consisting of 2 parts:

a tool_call part in which you can find the generated artifacts.
a text part containing the final answer.

You can then retrieve the artifacts like this:

artifacts: list[dict] = response["messages"][-1]["parts"][0]["tool_call"]["result"]["file_artifacts"]

Each file artifact contains the following fields:

id: the ID of the artifact.
thumbnail_base64: a base64 encoded 256x256 webp thumbnail of the artifact if the artifact is an image.

Once you have the artifact ID you can retrieve the artifact content using the GET /api/v3/artifacts/:id/content endpoint.

Background mode

For heavy query that needs to be handled asynchronously, you can use the POST /api/v3/threads/turns endpoint with the background parameter set to true. For instance:

{
    "query": "What is the capital of France?",
    "background": true
}

You will receive an HTTP 200 response with the thread object but containing only the user query, notice how the status field is set to running:

Response example

{
    "id": "6d0d54c3-a87b-4c66-8af2-8ef59418358e",
    "object": "turn",
    "thread": "12df8e86-e8b0-49ee-8634-f5ce8944591c",
    "status": "running",
    "messages": [
      {
        "id": "22461762-afd8-4533-8cf2-2e7d516a38d6",
        "object": "message",
        "role": "user",
        "parts": [
          {
            "type": "text",
            "text": "What is the capital of France?"
          }
    ],
    "created_at": "2025-12-17T14:46:00.703903Z"
    }
  ],
  "created_at": "2025-12-17T14:46:00.702207Z"
}

You can then retrieve the thread id in the following way:

thread_id: str = response["thread"]

You can now poll periodically the GET /api/v3/threads/:id endpoint until the status field is set to completed. When the thread is back in its completed status, you can fetch its turns using the GET /api/v3/threads/:id/turns endpoint.

To only retrieve the last turn, you can set the limit query parameter to 1.

API Fundamentals

Chat & AI Models

LightOn models

Document Management

Motivation

Terminology

Turn

Thread

Parts

Message

Source

Artifact

Quickstart

Recipes

Scoping a workspace/document

Using a specific tool

Extending the system prompt

Requesting structured output

Generating artifacts

Background mode

API Fundamentals

Chat & AI Models

LightOn models

Document Management

​Motivation

​Terminology

​Turn

​Thread

​Parts

​Message

​Source

​Artifact

​Quickstart

​Recipes

​Scoping a workspace/document

​Using a specific tool

​Extending the system prompt

​Requesting structured output

​Generating artifacts

​Background mode

Motivation

Terminology

Turn

Thread

Parts

Message

Source

Artifact

Quickstart

Recipes

Scoping a workspace/document

Using a specific tool

Extending the system prompt

Requesting structured output

Generating artifacts

Background mode