LLM API

Chat completions

The primary endpoint for generating text. It accepts a list of messages and returns the model's reply.

Messages

A request is a list of messages, each with a role and content. The roles are:

  • system — instructions that steer behaviour. Put it first.
  • user — input from the end user.
  • assistant— the model's prior turns, included to continue a conversation.
  • tool — the result of a tool the model called.
Request body
{
  "model": "qwen3.6-27b",
  "messages": [
    {"role": "system", "content": "You answer in British English."},
    {"role": "user", "content": "Summarise photosynthesis in one sentence."}
  ],
  "max_tokens": 256,
  "temperature": 0.7
}

Common parameters

  • max_tokens — cap the length of the reply.
  • temperature (0–2) — higher is more creative, lower is more deterministic.
  • top_p (0–1) — nucleus sampling; an alternative to temperature.
  • stop — up to four strings that end generation.
  • seed — best-effort reproducibility.
  • stream — stream the reply token by token (see Streaming & reasoning).

The response

The reply is in choices[0].message.content. Token accounting is in usage, and finish_reason tells you why generation stopped (stop, length, or tool_calls).

Response
{
  "id": "chatcmpl-3031...",
  "object": "chat.completion",
  "model": "qwen3.6-27b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Plants convert..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 24, "completion_tokens": 31, "total_tokens": 55 }
}

Watch finish_reason

A finish_reason of length means the reply was cut off by max_tokens. Raise the limit if you need the full answer.

JSON mode

Set response_format to constrain the model to valid JSON — useful for downstream parsing. Always state the schema you want in the prompt as well.

json
{
  "model": "qwen3.6-27b",
  "messages": [
    {"role": "user", "content": "Return {\"city\": string, \"country\": string} for Brussels."}
  ],
  "response_format": { "type": "json_object" }
}

Tool calling

Describe functions the model may call. When it decides to call one, the reply contains tool_calls instead of content; you run the function and send the result back as a tool message.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="qwen3.6-27b",
    messages=[{"role": "user", "content": "Weather in Ghent?"}],
    tools=tools,
)

call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)