LLM API

Streaming & reasoning

Render the reply as it is produced, and inspect how reasoning models think.

Streaming

Set stream: true to receive the completion as server-sent events. Each event is a chat.completion.chunk carrying a delta with the next slice of content. The stream ends with a literal data: [DONE] line.

Event stream

data: {"object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":""}}]}

data: {"object":"chat.completion.chunk","choices":[{"delta":{"content":"Paris"}}]}

data: {"object":"chat.completion.chunk","choices":[{"delta":{"content":", Berlin"}}]}

data: [DONE]

stream = client.chat.completions.create(
    model="qwen3.6-27b",
    messages=[{"role": "user", "content": "Write a haiku about wind."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

Reasoning models

Several models (for example qwen3.6-27b and qwen3-5-122b-a10b) think before answering. Their chain-of-thought is returned separately in a reasoning field, leaving content for the final answer. While streaming, reasoning arrives in delta.reasoning before delta.content.

Non-streamed reasoning response

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "reasoning": "The user asked for three capitals. EU members include...",
        "content": "Paris, Berlin and Madrid."
      },
      "finish_reason": "stop"
    }
  ]
}

Showing the thinking

Render reasoning in a collapsible “thinking” panel and content as the answer. If you only need the answer, ignore the field — it does not change the response shape.

Budget for reasoning tokens

Reasoning consumes completion tokens. If replies look truncated, raise max_tokens so the model has room to think and answer.