Import from CSV
RavenDB can import CSV data files into .json document databases.
Because some CSV files are formatted differently, the RavenDB Studio enables you to configure the CSV import by specifying the type of CSV
- Fields delimiter (comma, semi-colon, tab, or space)
- Text qualifier (double or single quote)
- How to handle whitespace
In this page:
What is CSV
A Comma-Separated Values (CSV) file is a delimited text file that typically uses a comma to separate values (from Wikipedia).
The first line of a CSV file defines the column headers in a table or the keys in a .json document.
How should I format my documents as CSV
RavenDB uses JSON format for storing documents, thus the CSV lines representing documents should have a specific format. There are three types of properties in JSON:
- Primitive: values that are numbers, strings, or booleans
- Nested Object: where the value of the property is a JSON object
- Array: an array of values that can either be primitives, nested objects, or arrays
Let's look at a sample JSON document:
{
"Name": "Import from CSV",
"NestedObject": {
"Name": "Inner Object"
},
"ArrayObject": [
1,
2,
3,
4
],
"@metadata": {
"@collection": "Samples"
}
}
The Name
property is a primitive and should appear unescaped in the CSV like so:
Name
Import from CSV
The NestedObject
property is a nested JSON object and as such should be decomposed into multiple properties - one for each nested property.
The decomposition rule goes as follows:
[the name of the parent property
].[name of the inner object property
]
like so:
NestedObject.Name
Inner Object
The import process will combine properties with the same prefix back into one JSON object.
The ArrayObject
property is an array and as such contains multiple values. These should be escaped as a string like so:
ArrayObject
"[1,2,3,4]"
The last thing we need in order for an import to work is to add the collection
property. If we don't, the name of the collection will derive from the CSV file name.
@metadata.@collection
Samples
If we want to import the document with a specific id
we need to include an @id
property too.
@id
Samples/1-A
The complete CSV line should look like this:
@id,Name,NestedObject.Name,ArrayObject,@metadata.@collection
Samples/1-A,Import from CSV,Inner Object,"[1,2,3,4]",Samples
Note that the first line of a .csv file applies to the header rows.
It names the column headers in a table, but in a .json document, it names the key properties.
How to Import a CSV via RavenDB Studio
Now that we've got a valid CSV file we can import it to RavenDB.
Import CSV file
- Tasks tab.
Click to see task options. - Import Data.
Select to see import options. - Current database
Make sure that you are importing into the correct database. - From CSV File
Select to open the Import from CSV interface. - Import from File
Select the file to import. - Import to Collection
Name the collection where the file will be imported. If you don't choose a name, RavenDB will use the CSV file name. When running an RQL query, the codefrom
often refers to the collection that contains the desired document. -
CSV options
Define CSV formatting configurations. Make sure they match the source .csv file.- Fields Delimiter
Comma is the standard default. Some CSVs separate fields with semi-colons, spaces or tabs instead. - Text Qualifier
Double quote is the standard default. Some CSVs use single quotation marks instead.
Commas or other field delimiters will be ignored inside the chosen text qualifier (quotation marks). -
Whitespace
CSV data is meant to be clean of white spaces because in strings, they can lead to misinterpretations. Trimming is an opportunity to clean-up unwanted white spaces. See if your source data has unwanted white spaces before you decide if or how to trim.- The following sample CSV has white spaces between fields that are unnecessary because commas separate fields. Raven DB Studio import can clean these white spaces with the "Trim whitespace around fields" option.
- If there were whitespaces in the fields (e.g. between a date and its comma), it could be more problematic and the option "Trim whitespace inside strings" would clean these, but it would also clean up the desired whitespaces inside the titles.
- The RavenDB import from CSV tool does not confuse between commas that separate fields and commas in the title
stings because the titles are inside double quotation marks which makes it clear to RavenDB not to use these
commas to separate.
- Fields Delimiter
- Select Import Collection
After importing the CSV file, the resulting .json document should look like the document above.