LLM API

LLM API

An OpenAI-compatible inference API. Bring the SDK you already use, change the base URL and key, and call the UPGREAT models.

The UPGREAT LLM API speaks the same wire format as the OpenAI API. Anything that talks to /v1/chat/completions, /v1/embeddings, /v1/images/generations or /v1/audio/transcriptions works unchanged once you point it at our base URL and supply an UPGREAT inference key.

Base URL
https://llm.upgreat.ai/v1
Compatibility
OpenAI API

Why it is a drop-in

Every official OpenAI SDK accepts a custom base_url. Set it to https://llm.upgreat.ai/v1, set your API key, and the rest of your code is identical.

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.upgreat.ai/v1",
    api_key="$UPGREAT_API_KEY",
)

resp = client.chat.completions.create(
    model="qwen3.6-27b",
    messages=[{"role": "user", "content": "Hello from UPGREAT!"}],
)
print(resp.choices[0].message.content)

What you can do

  • Chat completions — multi-turn conversations with streaming, tool calling and reasoning models.
  • Embeddings — dense vectors for semantic search, clustering and RAG.
  • Image generation — text-to-image with the Flux and Qwen-Image families.
  • Audio transcription — speech-to-text with Whisper.

Differences from OpenAI

The API is a drop-in for the common endpoints, with a few things to know:

  • Reasoning field. Reasoning models return their thinking in a separate reasoning field on the message (and delta.reasoning while streaming). It is additive — code that only reads content is unaffected. See Streaming & reasoning.
  • Tool calling varies by model. Function calling follows the OpenAI schema but is not supported by every model. Confirm with the model you intend to use.
  • Models you can call come from GET /v1/modelsand differ from OpenAI's catalog. There is no per-model retrieve endpoint (GET /v1/models/{id}).
  • Image output is returned as base64 (b64_json); URL responses are not provided.
  • Some OpenAI endpoints are not offered (for example reranking, files, batch and assistants).

Data handling

For how prompts and outputs are handled, including retention, see your UPGREAT agreement or ask your contact.

Next steps

  • Authentication — create a key and send it correctly.
  • Quickstart — your first request in three languages.
  • Models — discover what you can call, by capability.