What Are Embedding Dimensions?

Question

When an embedding model outputs a vector with 1024 or 3072 dimensions, what does that mean? Does more = better?

Explanation

A dimension is one number in the vector. Each number captures one aspect of the text's meaning.

Think of describing a person:

2 dimensions (height, weight) - you can tell people apart, but not very well
5 dimensions (+ age, hair color, eye color) - much better
3072 dimensions - captures small details you couldn't even name, learned by the model during training

More dimensions = better?

Yes, but each step gives you less and less improvement:

384d (MiniLM, local, free) - correct
768d (older Google models) - good
1024d (Cohere embed-v3) - very good
3072d (Google gemini-embedding-001) - excellent

Going from 384 to 1024 is a big jump. Going from 1024 to 3072 is a small improvement but 3x the storage cost.

Is it configurable?

Usually no - the model decides. Exception: Matryoshka embeddings (like Russian dolls). The model is trained so you can cut the vector down to fewer dimensions and it still works. The first dimensions capture the most important info, the last ones capture fine details.

What matters more than dimensions

Model quality - a good 384d model beats a bad 1024d model
Training domain - a model trained on English tech docs is better for your PDFs than a generic one
Chunking - bad chunks = bad embeddings, no matter the dimension
Consistency - you MUST use the same model for indexing and searching

Example

We switched from Cohere (1024d) to Google gemini-embedding-001 (3072d). This required a full re-index because the old vectors were on a completely different "map" - you can't compare 1024-number coordinates with 3072-number coordinates.