LLM API
Embeddings
Embeddings map text to a vector of numbers so you can measure semantic similarity.
Use embeddings to power semantic search, deduplication, clustering, classification and retrieval-augmented generation (RAG). Similar meanings produce nearby vectors, which you compare with cosine similarity.
Create an embedding
Pass a single string or an array of strings. Each input gets one vector back, in request order.
resp = client.embeddings.create(
model="bge-m3",
input=["semantic search over documents", "retrieval-augmented generation"],
)
for item in resp.data:
print(item.index, len(item.embedding))Response
{
"object": "list",
"model": "bge-m3",
"data": [
{ "index": 0, "object": "embedding", "embedding": [-0.0399, 0.0373, ...] },
{ "index": 1, "object": "embedding", "embedding": [0.0112, -0.0287, ...] }
],
"usage": { "prompt_tokens": 8, "total_tokens": 8 }
}Cosine similarity
To compare two embeddings, take the cosine of the angle between them:
import numpy as np
def cosine(a, b):
a, b = np.array(a), np.array(b)
return float(a @ b / (np.linalg.norm(a) * np.linalg.norm(b)))
sim = cosine(resp.data[0].embedding, resp.data[1].embedding)Choosing a model
bge-m3 is a strong multilingual default. Reach for qwen3-embedding-8b when retrieval quality matters more than cost or latency. Use one model consistently — vectors from different models are not comparable.