LLM API

Audio transcription

Convert spoken audio into text with whisper-large-v3-turbo.

Transcribe a file

Unlike the JSON endpoints, transcription takes a multipart form upload. Send the audio as file and the model as whisper-large-v3-turbo.

with open("meeting.mp3", "rb") as f:
    resp = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",
        file=f,
    )

print(resp.text)

Response formats

By default you get JSON with a text field. Use response_format to choose another shape:

json — { "text": "…" } (default).
text — plain text, no envelope.
verbose_json — text plus segment metadata.
srt / vtt — subtitle files.

Subtitles (SRT)

curl https://llm.upgreat.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $UPGREAT_API_KEY" \
  -F "model=whisper-large-v3-turbo" \
  -F "file=@talk.wav" \
  -F "response_format=srt"

Word and segment timings

Pass response_format=verbose_json together with timestamp_granularities[]=word (or segment) to get timestamps you can use to align captions or build a transcript player.