LLM API
Chat completions
The primary endpoint for generating text. It accepts a list of messages and returns the model's reply.
Messages
A request is a list of messages, each with a role and content. The roles are:
- system — instructions that steer behaviour. Put it first.
- user — input from the end user.
- assistant— the model's prior turns, included to continue a conversation.
- tool — the result of a tool the model called.
{
"model": "qwen3.6-27b",
"messages": [
{"role": "system", "content": "You answer in British English."},
{"role": "user", "content": "Summarise photosynthesis in one sentence."}
],
"max_tokens": 256,
"temperature": 0.7
}Common parameters
max_tokens— cap the length of the reply.temperature(0–2) — higher is more creative, lower is more deterministic.top_p(0–1) — nucleus sampling; an alternative to temperature.stop— up to four strings that end generation.seed— best-effort reproducibility.stream— stream the reply token by token (see Streaming & reasoning).
The response
The reply is in choices[0].message.content. Token accounting is in usage, and finish_reason tells you why generation stopped (stop, length, or tool_calls).
{
"id": "chatcmpl-3031...",
"object": "chat.completion",
"model": "qwen3.6-27b",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Plants convert..." },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 24, "completion_tokens": 31, "total_tokens": 55 }
}Watch finish_reason
Afinish_reason of length means the reply was cut off by max_tokens. Raise the limit if you need the full answer.JSON mode
Set response_format to constrain the model to valid JSON — useful for downstream parsing. Always state the schema you want in the prompt as well.
{
"model": "qwen3.6-27b",
"messages": [
{"role": "user", "content": "Return {\"city\": string, \"country\": string} for Brussels."}
],
"response_format": { "type": "json_object" }
}Tool calling
Describe functions the model may call. When it decides to call one, the reply contains tool_calls instead of content; you run the function and send the result back as a tool message.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
resp = client.chat.completions.create(
model="qwen3.6-27b",
messages=[{"role": "user", "content": "Weather in Ghent?"}],
tools=tools,
)
call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)