Data Types for Vector Search
-
Data for vector search can be stored in raw or pre-quantized formats using several data types,
as outlined below. -
Text and numerical data that is not pre-quantized can be further quantized in the generated embeddings.
Learn more in Quantization options. -
In this page:
Supported data types for vector search
Textual data
string
- A single text entry.
string[]
- An array of text entries.
Numerical data
-
You can store pre-generated embedding vectors in your documents,
typically created by machine-learning models from text, images, or other sources. -
When storing numerical embeddings in a document field:
- Ensure that all vectors within this field across all documents in the collection are generated by the same model and model version and have the same dimensions.
- Consistency in both dimensionality and model source is crucial for meaningful comparisons in the vector space.
-
In addition to the native types described below, we highly recommended using RavenVector
for efficient storage and fast queries when working with numerical embeddings.
Raw embedding data:
Use when precision is critical.
float[]
- A single vector of numerical values representing raw embedding data.
float[][]
- An array of vectors, where each entry is a separate embedding vector.
Pre-quantized data:
Use when you prioritize storage efficiency and query speed.
byte[] / sbyte[]
- A single pre-quantized embedding vector in the Int8 or Binary quantization format.
byte[][] / sbyte[][]
- An array of pre-quantized embedding vectors.
When storing data in these formats in your documents, you should use RavenDB’s vector quantizer methods.
Base64-encoded data:
Use when embedding data needs to be represented as a compact and easily serializable string format.
string
- A single vector encoded as a Base64 string.
string[]
- An array of Base64-encoded vectors.
Using lists:
While arrays (float[]
) are the most direct representation of numerical embeddings,
you can also use lists (for example, List<float>
or List<float[]>
) for dynamic sizing in your application code.
RavenVector
RavenVector is RavenDB's dedicated data type for storing and querying numerical embeddings.
It is highly optimized to minimize storage space and improve the speed of reading arrays from disk,
making it ideal for vector search.
For example, you can define:
RavenVector<float>; // A single vector of floating-point values.
List<RavenVector<float>>; // A collection of float-based vectors.
RavenVector<sbyte>; // A single pre-quantized vector in Int8 format (8-bit signed integer).
List<RavenVector<sbyte>>; // A collection of sbyte-based vectors.
RavenVector<byte>; // A single pre-quantized vector in Binary format (8-bit unsigned integer).
List<RavenVector<byte>>; // A collection of byte-based vectors.
When a class property is stored as a RavenVector
, the vector's content will appear under the @vector
field in the JSON document stored in the database.
For example:
public class SampleClass
{
public string Id { get; set; }
public string Title { get; set; }
// Storing data in a RavenVector property for optimized storage and performance
public RavenVector<float> EmbeddingRavenVector { get; set; }
// Storing data in a regular array property
public float[] EmbeddingVector { get; set; }
}

RavenVector in a JSON document