Serialization¶
Graphio can serialize NodeSet
and RelationshipSet
objects to different formats.
This can be used to store processed, graph-ready data in a file.
Graphio supports the following formats for both NodeSet
and RelationshipSet
objects:
- combined CSV and JSON files (CSV file with all data and JSON file with metadata), can be deserialized again
- CSV files with all data (useful for quick tests, cannot be fully deserialized again)
- JSON files with all data (useful for quick tests with small datasets, contains redundant data)
Combined CSV and JSON files¶
The most useful serialization format stores the data in a CSV file and the metadata in a JSON file. This avoids redundancy and allows to deserialize the data again.
Data Format¶
Nodes¶
The JSON file with metadata contains at least the following information:
- the labels (labels)
- property keys used for MERGE operations (merge_keys)
The csv file contains the properties of one node per row, the header contains the property keys.
Example:
nodeset.json
:
{
"labels": [
"Person"
],
"merge_keys": [
"name"
]
}
nodeset.csv
:
name,age
Lisa,42
Bob,23
Relationships¶
The JSON file with metadata contains at least the following information:
- start node labels
- end node labels
- start node property keys to MATCH the start node
- end node property keys to MATCH the end node
- relationship type
The csv file contains one relationship per row, the start node, end node, and relationship properties are indicated by header prefixes (start_, end_, rel_).
Example:
relset.json
:
{
"start_node_labels": ["Person"],
"end_node_labels": ["Person"],
"start_node_properties": ["name"],
"end_node_properties": ["name"],
"rel_type": "KNOWS"
}
relset.csv
:
start_name,end_name,rel_since
Lisa,Bob,2018
Bob,Lisa,2018
Serialize to CSV and JSON¶
To serialize a NodeSet
or RelationshipSet
object use to_csv_json_set()
:
people = NodeSet(['Person'], merge_keys=['name']
people.add_node({'name': 'Lisa'})
people.add_node({'name': 'Bob'})
people.to_csv_json_set('people.json', 'people.csv')
knows = RelationshipSet('KNOWS', ['Person'], ['Person'], ['name'], ['name'])
knows.add_relationship({'name': 'Lisa'}, {'name': 'Bob'}, {'since': '2018'})
knows.to_csv_json_set('knows.json', 'knows.csv')
CSV files¶
Graphio can serialize NodeSet
and RelationshipSet
objects to CSV files in the same
format as the CSV files in the combined CSV/JSON format. This can be useful for quick tests with small datasets.
See NodeSet.to_csv()
and RelationshipSet.to_csv()
for details:
people = NodeSet(['Person'], merge_keys=['name']
people.add_node({'name': 'Lisa'})
people.add_node({'name': 'Bob'})
people.to_csv('nodeset.csv')
knows = RelationshipSet('KNOWS', ['Person'], ['Person'], ['name'], ['name'])
knows.add_relationship({'name': 'Lisa'}, {'name': 'Bob'}, {'since': '2018'})
knows.to_csv('relset.csv')
Graphio can generate matching Cypher queries to load these CSV files to Neo4j:
# NodeSet CREATE query
people.create_csv_query('nodeset.csv')
# NodeSet MERGE query
people.merge_csv_query('nodeset.csv')
# RelationshipSet CREATE query
knows.create_csv_query('relset.csv')
JSON files¶
note: | Deserialization of simple JSON representations is currently not supported. Use the combined JSON/CSV format instead. The JSON serialization can still be useful to test small datasets. |
---|
NodeSet
and RelationshipSet
objects can be serialized to JSON:
people = NodeSet(['Person'], merge_keys=['name']
people.add_node({'name': 'Lisa'})
people.to_json('nodeset.json')
This will create a JSON file with full node descriptions:
nodeset.json
:
{
"labels": [
"Person"
],
"merge_keys": [
"name"
],
"nodes": [
{
"name": "Lisa"
}
]
}
The same works with RelationshipSet
objects:
person_like_food = RelationshipSet('LIKES', ['Person'], ['Food'], ['name'], ['type'])
person_like_food.add_relationship({'name': 'Lisa'}, {'type': 'Sushi'}, {'since': 'always'})
person_like_food.to_json('relset.json')