Knowledge Base: Document Identifier Generation
-
A Document Identifier,
ID
in short, is a unique identification string associated with the document.
IDs are globally unique in the scope of the database: no two documents in the same database will have the same ID. -
IDs can be generated using different strategies: by a client, by the server, by client-server collaboratoin, or by a map-reduce index output.
-
In this page:
Document IDs
ID Generation Strategies
Document identifiers can be generated using the following strategies:
Click a strategy to read more about its implementation.
ID Structure
The document ID is typiclly composed of the collection name as prefix, a slash (/
), and the unique ID portion
(including a node tag, e.g. A
, indicating which server node the document resides on).
E.g.: users/1-A
Note that this behavior is not mandatory:
- RavenDB does not require the collection prefix to be included in the ID string.
- The Identifier Parts Separator can be replaced.
ID Limitations
The following limitations apply to document IDs:
- Identifiers length limit: 512 bytes (in UTF8)
- Identifiers cannot end with the following reserved characters:
/
(reserved for Server-Side ID)
|
(reserved for Identity generation)
ID Generation by Client
Strategy: Semantic ID
-
Generated by:
The user -
Description:
- The semantic ID is generated by the user (using the Client API or from the Studio) and not by RavenDB server.
As such, it is the user's responsibility to generate unique IDs. - Creating a new document with an existing semantic ID will overwrite the existing document.
- The semantic ID is generated by the user (using the Client API or from the Studio) and not by RavenDB server.
-
When to use:
Use a semantic ID when you want the document to have an identifier that has some meaningful value. -
Example:
-
Documents with a unique semantic ID containing a user's email can be generated under the 'Users' collection:
- users/ayende@ayende.com
- users/john@john.doe
-
For clarity, the document content can be indicated within the Semantic ID string:
- accounts/591-192/txs/2017-05-17
Implying that the document holds all the transactions from May 17th, 2017 for account 591-192
- accounts/591-192/txs/2017-05-17
-
Documents with a unique semantic ID containing a user's email can be generated under the 'Users' collection:
ID Generation by Server
Identifier Parts Separator
By default, the components of document IDs created by the server are separated by the /
character.
This default separator can be replaced with any other character except |
in the Document Store Conventions.
Examples:
// Change the ID separator from the default `/` to `-`
store.Maintenance.Send(
new PutClientConfigurationOperation(
new ClientConfiguration { IdentityPartsSeparator = '-' }));
using (var session = store.OpenSession())
{
// The `|` causes the cluster to generate an identity
// The ID is unique over the whole cluster
// The first generated ID will be `Prefix-1`
session.Store(new User
{
Name = "John",
Id = "Prefix|"
});
session.SaveChanges();
}
// Change the ID separator from the default `/` to `-`
store.Maintenance.Send(
new PutClientConfigurationOperation(
new ClientConfiguration { IdentityPartsSeparator = '-' }));
using (var session = store.OpenSession())
{
// Since an ID wasn't explicitly provided, the server generates one.
// The first generated ID on node A will be `users-1-A`
session.Store(new User
{
Name = "John",
});
session.SaveChanges();
}
Strategy: Guid
-
Generated by:
The server -
Description:
- When a document ID is not specified, RavenDB server will generate a globally unique identifier (GUID) for the stored new document.
- Although this is the simplest way to generate a document ID, Guids are not human-friendly when it comes to debugging or troubleshooting and are less recommended.
-
When to use:
Only When you don't care about the exact ID generated and about the ease of troubleshooting your app...
Strategy: Server-Side ID
-
Generated by:
The server -
Description:
- Upon document creation, providing a document ID string that ends with a slash ( / ) will cause the server to generate a server-side ID.
- The RavenDB server that is handling the request will increment the value of its Last Document Etag.
This Etag and the Server Node Tag are appended by the server to the end of the ID string provided. - Since the etag on which the ID is based changes upon any adding, deleting, or updating a document,
the only guarantee about the Server-Side ID is that it is always increasing, but not always sequential.
-
When to use:
- Use the server-side ID when you don't care about the exact ID that is given to a newly created document.
- Recommended when a large number of documents are needed to be created, such as in bulk insert scenarios,
as this method requires the least amount of work from RavenDB.
-
Example:
-
From a server running on node 'A':
- Creating the first document with 'users/' => will result with document ID: 'users/0000000000000000001-A'
- Creating a second document with 'users/' => will result with document ID: 'users/0000000000000000002-A'
-
From a server running on node 'B':
- Creating a third document with 'users/' => can result for example with document ID: 'users/0000000000000000034-B'
- Note: node tag 'B' was appended to the ID generated, as the server handling the request is on node 'B'.
But, since each server has its own local Etag, the numeric part in the ID will not necessarily be sequential (or unique) across the nodes
within the database group in the cluster, as can happen when creating documents at partition time.
-
From a server running on node 'A':
-
Note:
If you manually generate a document ID with a pattern that matches the server-side generated IDs,
RavenDB will not check for that and will overwrite the existing document.
The leading zeros help avoid such conflicts with any existing document by accident.
Strategy: Identity
-
Generated by:
The server -
Description:
- Upon document creation, providing a document ID string that ends with a pipe symbol (
|
) will cause the server to generate an identity. - RavenDb will create a simple, always-incrementing value and append it to the ID string provided (replacing the pipe with a slash).
- As opposed to the Server-Side ID, This value will be unique across all the nodes within the Database Group in the cluster.
- Upon document creation, providing a document ID string that ends with a pipe symbol (
-
When to use:
Use an identity only if you really need documents with incremental IDs,
i.e. when generating invoices, or upon legal obligation.
Using an identity guarantees that IDs will be incremental, but does not guarantee that there wouldn't be gaps in the sequence.
The IDs sequence can therefore be, for example,companies/1
,companies/2
,companies/4
..
This is because -- Documents could have been deleted.
- A failed transaction still increments the identity value, thus causing a gap in the sequence.
-
Example:
-
From a server running on node 'A':
- Creating the first document with 'users|' => will result with document ID: 'users/1'
- Creating a second document with 'users|' => will result with document ID: 'users/2'
-
From a server running on node 'B':
- Creating a third document with 'users|' => will result with document ID: 'users/3'
-
From a server running on node 'A':
-
Note:
- Identity has a real cost associated with it.
Generating identities in a cluster where the database is replicated across more than one node requires a lot of work. - Network round trips are required as the nodes must coordinate with one another so that the same identity is not generated on 2 different nodes in the cluster.
- Moreover, upon a failure scenario, if the node cannot communicate with the other cluster members, or the majority of nodes cannot be communicated with, saving the document will fail as the requested identity cannot be generated.
- All the other ID generation methods can work without any issue when the server is disconnected from the cluster,
so unless you truly need incremental IDs, use one of the other options.
- Identity has a real cost associated with it.
ID Generation by Client-Server Collaboration
Strategy: HiLo Algorithm
-
Generated by:
Both the server and the client -
Description:
- The Hilo algorithm enables generating document IDs on the client.
- The client reserves a range of identifiers from the server and the server ensures that this range will be provided only to this client.
- Different clients will receive different ranges.
- Each client can then safely generate identifiers within the range it was given, no further coordination with the server is required.
- For a more detailed explanation see HiLo Algorithm
-
When to use:
When you want to write code on the client that will create a new document and use its ID immediately in the same transaction, without making another call to the server to get the ID and use it in a separate transaction. -
Example:
people/128-A, people/129-B
ID Generation by Map-Reduce Index Output
Strategy: Artificial Document ID
-
Generated by:
The server -
Description:
- The output from a Map-Reduce index can be saved as Artificial Documents in a new collection.
- You have no control over these artificial documetns IDs, they are generated by RavenDB based on the hash of the reduce key.
- For a more detailed explanation see Artificial Documents abd the creation of their IDs.
-
When to use:
When you need to further process the Map-Reduce index results by:- Having a recursive Map-Reduce index operation, setting up indexes on top of the Artificial Documents.
- Setting up a RavenDB ETL Task on the Artificial Documents collection to a dedicated database on a separate cluster for further processing, as well as other ongoing tasks such as: SQL ETL and Subscriptions.
-
Example:
MonthlyProductSales/13770576973199715021