LLM API
Streaming & reasoning
Render the reply as it is produced, and inspect how reasoning models think.
Streaming
Set stream: true to receive the completion as server-sent events. Each event is a chat.completion.chunk carrying a delta with the next slice of content. The stream ends with a literal data: [DONE] line.
data: {"object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":""}}]}
data: {"object":"chat.completion.chunk","choices":[{"delta":{"content":"Paris"}}]}
data: {"object":"chat.completion.chunk","choices":[{"delta":{"content":", Berlin"}}]}
data: [DONE]stream = client.chat.completions.create(
model="qwen3.6-27b",
messages=[{"role": "user", "content": "Write a haiku about wind."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)Reasoning models
Several models (for example qwen3.6-27b and qwen3-5-122b-a10b) think before answering. Their chain-of-thought is returned separately in a reasoning field, leaving content for the final answer. While streaming, reasoning arrives in delta.reasoning before delta.content.
{
"choices": [
{
"message": {
"role": "assistant",
"reasoning": "The user asked for three capitals. EU members include...",
"content": "Paris, Berlin and Madrid."
},
"finish_reason": "stop"
}
]
}Showing the thinking
Renderreasoning in a collapsible “thinking” panel and content as the answer. If you only need the answer, ignore the field — it does not change the response shape.Budget for reasoning tokens
Reasoning consumes completion tokens. If replies look truncated, raisemax_tokens so the model has room to think and answer.