LLM API

Audio transcription

Convert spoken audio into text with whisper-large-v3-turbo.

Transcribe a file

Unlike the JSON endpoints, transcription takes a multipart form upload. Send the audio as file and the model as whisper-large-v3-turbo.

with open("meeting.mp3", "rb") as f:
    resp = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",
        file=f,
    )

print(resp.text)

Response formats

By default you get JSON with a text field. Use response_format to choose another shape:

  • json{ "text": "…" } (default).
  • text — plain text, no envelope.
  • verbose_json — text plus segment metadata.
  • srt / vtt — subtitle files.
Subtitles (SRT)
curl https://llm.upgreat.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $UPGREAT_API_KEY" \
  -F "model=whisper-large-v3-turbo" \
  -F "file=@talk.wav" \
  -F "response_format=srt"

Word and segment timings

Pass response_format=verbose_json together with timestamp_granularities[]=word (or segment) to get timestamps you can use to align captions or build a transcript player.