Generating Embeddings - Overview
-
RavenDB can serve as a vector database, see Why choose RavenDB as your vector database.
-
Vector search can be performed on:
- Raw text stored in your documents.
- Pre-made embeddings that you created yourself and stored using these Data types.
- Pre-made embeddings that are automatically generated from your document content by RavenDB's tasks
using external service providers, as explained below.
Embeddings generation - overview
Embeddings generation - process flow
-
Define an Embeddings Generation Task:
Specify a connection string that defines the AI provider and model for generating embeddings.
Define the source content - what parts of the documents will be used to create the embeddings. -
Source content is processed:
- The task extracts the specified content from the documents.
- If a processing script is defined, it transforms the content before further processing.
- The text is split according to the defined chunking method; a separate embedding will be created for each chunk.
- Before contacting the provider, RavenDB checks the embeddings cache to determine whether an embedding already exists for the given content from that provider.
- If a matching embedding is found, it is reused, avoiding unnecessary requests.
If no cached embedding is found, the transformed and chunked content is sent to the configured AI provider.
-
Embeddings are generated by the AI provider:
The provider generates embeddings and sends them back to RavenDB.
If quantization was defined in the task, RavenDB applies it to the embeddings before storing them. -
Embeddings are stored in your database:
- Each embedding is stored as an attachment in a dedicated collection.
- RavenDB maintains an embeddings cache, allowing reuse of embeddings for the same source content and reducing provider calls. Cached embeddings expire after a configurable duration.
-
Perform vector search:
Once the embeddings are stored, you can perform vector searches on your document content by:- Running a dynamic query, which automatically creates an auto-index for the search.
- Defining a static index to store and query embeddings efficiently.
The query search term is split into chunks, and each chunk is looked up in the cache.
If not found, RavenDB requests an embedding from the provider and caches it.
The embedding (cached or newly created) is then used to compare against stored vectors. -
Continuous processing:
- Embeddings generation tasks are Ongoing Tasks that process documents as they change.
Before contacting the provider after a document change, the task first checks the cache to see if a matching embedding already exists, avoiding unnecessary requests. - The requests to generate embeddings from the source text are sent to the provider in batches.
The batch size is configurable, see the Ai.Embeddings.MaxBatchSize configuration key. - A failed embeddings generation task will retry after the duration set in the
Ai.Embeddings.MaxFallbackTimeInSec configuration key.
- Embeddings generation tasks are Ongoing Tasks that process documents as they change.
Supported providers
-
The following service providers are supported for auto-generating embeddings using tasks:
- OpenAI & OpenAI-compatible providers
- Azure Open AI
- Google AI
- Hugging Face
- Ollama
- Mistral AI
- bge-micro-v2 (a local embedded model within RavenDB)


Creating an embeddings generation task
-
An embeddings generation tasks can be created from:
- The AI Tasks view in the Studio, where you can create, edit, and delete tasks. Learn more in AI Tasks - list view.
- The Client API - see Configuring an embeddings generation task - from the Client API
-
From the Studio:
Add AI Task
- Go to the AI Hub menu.
- Open the AI Tasks view.
- Click Add AI Task to add a new task.
Add a new Embeddings Generation Task
-
See the complete details of the task configuration in the Embeddings generation task article.
Monitoring the tasks
-
The status and state of each embeddings generation task are visible in the AI Tasks - list view.
-
Task performance and activity over time can be analyzed in the AI Tasks Stats view,
where you can track processing duration, batch sizes, and overall progress.
Learn more about the functionality of the stats view in the Ongoing Tasks Stats article. -
The number of embeddings generation tasks across all databases can also be monitored using SNMP.
The following SNMP OIDs provide relevant metrics: