Basic Workflow¶
NodeSets¶
With graphio you predefine the NodeSet
and add nodes:
from graphio import NodeSet
people = NodeSet(['Person'], merge_keys=['name'])
people.add_node({'name': 'Peter', 'city': 'Munich'})
The first argument for the NodeSet
is a list of labels used for all nodes in this NodeSet
.
The second optional argument are merge_keys
, a list of properties that confer uniqueness of the nodes
in this NodeSet
. All operations
based on MERGE
queries need unique properties to identify nodes.
When you add a node to the NodeSet you can add arbitrary properties to the node.
Uniqueness of nodes¶
The uniqueness of the nodes is not checked when adding to the NodeSet. Thus, you can create mulitple nodes with the same ‘name’ property.
Use NodeSet.add_unique()
to check if a node with the same properties exist already:
people = NodeSet(['Person'], merge_keys=['name'])
# first time
people.add_unique({'name': 'Jack', 'city': 'London'})
len(people.nodes) -> 1
# second time
people.add_unique({'name': 'Jack', 'city': 'London'})
len(people.nodes) -> 1
Warning
This function iterates all nodes when adding a new one and does not scale well. Use only for small nodesets.
RelationshipSets¶
In a similar manner, RelationshipSet
are predefined and you add relationships:
from graphio import RelationshipSet
person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'])
person_likes_food.add_relationship(
{'name': 'Peter'}, {'type': 'Pizza'}, {'reason': 'cheese'}
)
The arguments for the RelationshipSet
- relationship type
- labels of start node
- labels of end node
- property keys to match start node
- property keys to match end node
When you add a relationship to RelationshipSet
all you have to do is to define the matching properties for the
start node and end node. You can also add relationship properties.
Default properties¶
You can set default properties on the RelationshipSet
that are added to all relationships when loading data:
person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'],
default_props={'source': 'survey'})
Create Indexes¶
Both NodeSet
and RelationshipSet
allow you to create indexes to speed up data loading.
create_index()
creates indexes for all individual merge_keys
properties as well as a compound index.
create_index()
creates the indexes required for matching the start node and end node:
from graphio import RelationshipSet
from neo4j import GraphDatabase
driver = GraphDatabase.driver('neo4j://localhost:7687', auth=('neo4j', 'password'))
person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'])
person_likes_food.create_index(driver)
This will create single-property indexes for :Person(name) and :Food(type).
Load Data¶
After building NodeSet
and RelationshipSet
you can create or merge everything in Neo4j.
You need a neo4j.Driver
instance to create data. See: https://neo4j.com/docs/api/python-driver/current/api.html#api-documentation
from neo4j import GraphDatabase
driver = GraphDatabase.driver('neo4j://localhost:7687', auth=('neo4j', 'password'))
people.create(driver)
person_likes_food.create(driver)
Warning
Graphio does not check if the nodes referenced in the RelationshipSet
actually exist. It is meant
to quickly build data sets and load them into Neo4j, not to maintain consistency.
Create¶
create()
will, as the name suggests, create all data. This will create
duplicate nodes even if a merge_key
is set on a NodeSet
.
Merge¶
merge()
will merge on the merge_key
defined on the NodeSet
.
The merge operation for NodeSet
offers more control.
You can pass a list of properties that should not be overwritten on existing nodes:
NodeSet.merge(driver, preserve=['name', 'currency'])
This is equivalent to:
ON CREATE SET ..all properties..
ON MATCH SET ..all properties except 'name' and 'currency'..
Graphio can also append properties to arrays:
NodeSet.merge(driver, append_props=['source'])
This will create a list for the node property source
and append values ON MATCH
.
Both can also be set on the NodeSet
:
nodeset = NodeSet(['Person'], ['name'], preserve=['country'], array_props=['source'])
Group Data Sets in a Container¶
A Container
can be used to group NodeSet
and RelationshipSet
:
my_data = Container()
my_data.add(people)
my_data.add(person_likes_food)
Note
This is particularly useful if you build many NodeSet
and RelationshipSet
and want to group data sets (e.g. because of dependencies).
You can iterate the NodeSet
and RelationshipSet
in the Container
:
for nodeset in my_data.nodesets:
nodeset.create(driver)