Bulk Insert: How to Work With Bulk Insert Operation

  • BulkInsert is useful when inserting a large quantity of data from the client to the server.
  • It is an optimized time-saving approach with a few limitations like the possibility that interruptions will occur during the operation.

In this page:

Syntax

BulkInsertOperation BulkInsert(string database = null, CancellationToken token = default);
Parameters
database string The name of the database to perform the bulk operation on.
If null, the DocumentStore Database will be used.
token CancellationToken Cancellation token used to halt the worker operation.
Return Value
BulkInsertOperation Instance of BulkInsertOperation used for interaction.

BulkInsertOperation BulkInsert(string database, BulkInsertOptions options, CancellationToken token = default);
Parameters Type Description
database string The name of the database to perform the bulk operation on.
If null, the DocumentStore Database will be used.
options BulkInsertOptions Options to configure BulkInsert.
token CancellationToken Cancellation token used to halt the worker operation.
Return Value
BulkInsertOperation Instance of BulkInsertOperation used for interaction.

BulkInsertOperation BulkInsert(BulkInsertOptions options, CancellationToken token = default);
Parameters Type Description
options BulkInsertOptions Options to configure BulkInsert.
token CancellationToken Cancellation token used to halt the worker operation.
Return Value
BulkInsertOperation Instance of BulkInsertOperation used for interaction.

BulkInsertOperation

The following methods can be used when creating a bulk insert.

Methods

Signature Description
void Abort() Abort the operation
void Store(object entity, IMetadataDictionary metadata = null) Store the entity, identifier will be generated automatically on client-side. Optional, metadata can be provided for the stored entity.
void Store(object entity, string id, IMetadataDictionary metadata = null) Store the entity, with id parameter to explicitly declare the entity identifier. Optional, metadata can be provided for the stored entity.
void StoreAsync(object entity, IMetadataDictionary metadata = null) Store the entity in an async manner, identifier will be generated automatically on client-side. Optional, metadata can be provided for the stored entity.
void StoreAsync(object entity, string id, IMetadataDictionary metadata = null) Store the entity in an async manner, with id parameter to explicitly declare the entity identifier. Optional, metadata can be provided for the stored entity.
void Dispose() Dispose of an object
void DisposeAsync() Dispose of an object in an async manner

Limitations

  • BulkInsert is designed to efficiently push large volumes of data.
    Data is therefore streamed and processed by the server in batches.
    Each batch is fully transactional, but there are no transaction guarantees between the batches and the operation as a whole is non-transactional.
    If the bulk insert operation is interrupted mid-way, some of your data might be persisted on the server while some of it might not.
    • Make sure that your logic accounts for the possibility of an interruption that would cause some of your data not to persist on the server yet.
    • If the operation was interrupted and you choose to re-insert the whole dataset in a new operation, you can set SkipOverwriteIfUnchanged as true so the operation will overwrite existing documents only if they changed since the last insertion.
    • If you need full transactionality, using session may be a better option.
      Note that if session is used all of the data is processed in a single transaction, so the server must have sufficient resources to handle the entire data set included in the transaction.
  • Bulk insert is not thread-safe.
    A single bulk insert should not be accessed concurrently.
    • Using multiple bulk inserts concurrently on the same client is supported.
    • Usage in an async context is also supported.

Example

Create bulk insert

Here we create a bulk insert operation and insert a million documents of type Employee:

using (BulkInsertOperation bulkInsert = store.BulkInsert())
{
    for (int i = 0; i < 1000 * 1000; i++)
    {
        bulkInsert.Store(new Employee
        {
            FirstName = "FirstName #" + i,
            LastName = "LastName #" + i
        });
    }
}
BulkInsertOperation bulkInsert = null;
try
{
    bulkInsert = store.BulkInsert();
    for (int i = 0; i < 1000 * 1000; i++)
    {
        await bulkInsert.StoreAsync(new Employee
        {
            FirstName = "FirstName #" + i,
            LastName = "LastName #" + i
        });
    }
}
finally
{
    if (bulkInsert != null)
    {
        await bulkInsert.DisposeAsync().ConfigureAwait(false);
    }
}

BulkInsertOptions

The following options can be configured for BulkInsert.

CompressionLevel:

Parameter Type Description
Optimal string Compression level to be used when compressing static files.
Fastest
(Default)
string Compression level to be used when compressing HTTP responses with GZip or Deflate.
NoCompression string Does not compress.

Default compression level

For RavenDB versions up to 6.2, bulk-insert compression is Disabled (NoCompression) by default.
For RavenDB versions from 7.0 on, bulk-insert compression is Enabled (set to Fastest) by default.

SkipOverwriteIfUnchanged:

Use this option to avoid overriding documents when the inserted document and the existing one are similar.

Enabling this flag can exempt the server of many operations triggered by document-change, like re-indexation and subscription or ETL-tasks updates.
There is a slight potential cost in the additional comparison that has to be made between the existing documents and the ones that are being inserted.

using (var bulk = store.BulkInsert(new BulkInsertOptions
{
    SkipOverwriteIfUnchanged = true
}));