LLM API

Embeddings

Embeddings map text to a vector of numbers so you can measure semantic similarity.

Use embeddings to power semantic search, deduplication, clustering, classification and retrieval-augmented generation (RAG). Similar meanings produce nearby vectors, which you compare with cosine similarity.

Create an embedding

Pass a single string or an array of strings. Each input gets one vector back, in request order.

resp = client.embeddings.create(
    model="bge-m3",
    input=["semantic search over documents", "retrieval-augmented generation"],
)

for item in resp.data:
    print(item.index, len(item.embedding))

Response

json

{
  "object": "list",
  "model": "bge-m3",
  "data": [
    { "index": 0, "object": "embedding", "embedding": [-0.0399, 0.0373, ...] },
    { "index": 1, "object": "embedding", "embedding": [0.0112, -0.0287, ...] }
  ],
  "usage": { "prompt_tokens": 8, "total_tokens": 8 }
}

Cosine similarity

To compare two embeddings, take the cosine of the angle between them:

python

import numpy as np

def cosine(a, b):
    a, b = np.array(a), np.array(b)
    return float(a @ b / (np.linalg.norm(a) * np.linalg.norm(b)))

sim = cosine(resp.data[0].embedding, resp.data[1].embedding)

Choosing a model

bge-m3 is a strong multilingual default. Reach for qwen3-embedding-8b when retrieval quality matters more than cost or latency. Use one model consistently — vectors from different models are not comparable.