Bulk Insert: How to Work With Bulk Insert Operation
BulkInsert
is useful when inserting a large quantity of data from the client to the server.- It is an optimized time-saving approach with a few limitations such as transactionality and the possibility of interruptions during the operation.
In this page:
Syntax
BulkInsertOperation BulkInsert(string database = null, CancellationToken token = default);
Parameters | ||
---|---|---|
database | string | Name of database for which bulk operation should be performed. If null then the Database from DocumentStore will be used. |
token | CancellationToken | Cancellation token used in order to halt the worker operation. |
Return Value | |
---|---|
BulkInsertOperation | Instance of BulkInsertOperation used for interaction. |
BulkInsertOperation BulkInsert(string database, BulkInsertOptions options, CancellationToken token = default);
Parameters | Type | Description |
---|---|---|
database | string | Name of database for which bulk operation should be performed. If null then the Database from DocumentStore will be used. |
options | BulkInsertOptions | Options to configure BulkInsert. |
token | CancellationToken | Cancellation token used in order to halt the worker operation. |
Return Value | |
---|---|
BulkInsertOperation | Instance of BulkInsertOperation used for interaction. |
BulkInsertOperation BulkInsert(BulkInsertOptions options, CancellationToken token = default);
Parameters | Type | Description |
---|---|---|
options | BulkInsertOptions | Options to configure BulkInsert. |
token | CancellationToken | Cancellation token used in order to halt the worker operation. |
Return Value | |
---|---|
BulkInsertOperation | Instance of BulkInsertOperation used for interaction. |
BulkInsertOperation
The following methods can be used when creating a bulk insert.
Methods
Signature | Description |
---|---|
void Abort() | Abort the operation |
void Store(object entity, IMetadataDictionary metadata = null) | Store the entity, identifier will be generated automatically on client-side. Optional, metadata can be provided for the stored entity. |
void Store(object entity, string id, IMetadataDictionary metadata = null) | Store the entity, with id parameter to explicitly declare the entity identifier. Optional, metadata can be provided for the stored entity. |
void StoreAsync(object entity, IMetadataDictionary metadata = null) | Store the entity in an async manner, identifier will be generated automatically on client-side. Optional, metadata can be provided for the stored entity. |
void StoreAsync(object entity, string id, IMetadataDictionary metadata = null) | Store the entity in an async manner, with id parameter to explicitly declare the entity identifier. Optional, metadata can be provided for the stored entity. |
void Dispose() | Dispose an object |
void DisposeAsync() | Dispose an object in an async manner |
Limitations
-
BulkInsert is designed to efficiently push high quantities of data.
As such, data is streamed and processed by the server in batches.
Each batch is fully transactional, but there are no transaction guarantees between the batches. The operation as a whole is non-transactional. If your bulk insert is interrupted mid-way, some of your data might be persisted on the server while some of it might not.- Make sure that your logic accounts for the possibility of an interruption where some of your data has not yet persisted on the server.
- If the operation was interrupted and you choose to re-insert the whole dataset in a new operation,
you can configure SkipOverwriteIfUnchanged as
true
.
It only overwrites existing documents if a change has been made since the last insertion. - If you need full transactionality, the session may be a better option.
If using the session, because all of the data is processed in one transaction, your server resources must be able to handle the entire data-set included in the transaction.
-
Bulk insert is not thread-safe.
A single bulk insert should not be accessed concurrently.- The use of multiple bulk inserts concurrently on the same client is supported.
- Also the use in an async context is supported.
Example
Create bulk insert
Here we create a bulk insert operation and insert a million documents of type Employee:
using (BulkInsertOperation bulkInsert = store.BulkInsert())
{
for (int i = 0; i < 1000 * 1000; i++)
{
bulkInsert.Store(new Employee
{
FirstName = "FirstName #" + i,
LastName = "LastName #" + i
});
}
}
BulkInsertOperation bulkInsert = null;
try
{
bulkInsert = store.BulkInsert();
for (int i = 0; i < 1000 * 1000; i++)
{
await bulkInsert.StoreAsync(new Employee
{
FirstName = "FirstName #" + i,
LastName = "LastName #" + i
});
}
}
finally
{
if (bulkInsert != null)
{
await bulkInsert.DisposeAsync().ConfigureAwait(false);
}
}
BulkInsertOptions
The following options can be configured for BulkInsert.
CompressionLevel
Parameter | Type | Description |
---|---|---|
Optimal | string | Compression level to be used when compressing static files. |
Fastest | string | Compression level to be used when compressing HTTP responses with GZip or Deflate. |
NoCompression | string | Does not compress. |
SkipOverwriteIfUnchanged
Prevent overriding documents if there are no changes when compared to the already existing ones.
Enabling this can avoid a lot of additional work including triggering re-indexation, subscriptions, and ETL processes.
It introduces slight overlay into the bulk insert process because of the need to compare the existing documents with the ones that are being inserted.
using (var bulk = store.BulkInsert(new BulkInsertOptions
{
SkipOverwriteIfUnchanged = true
}));