Main Classes

NodeSet

class graphio.NodeSet(labels=None, merge_keys=None, batch_size=None, default_props=None, preserve=None, append_props=None, indexed=False, additional_labels: List[str] = None, source: bool = False)

Container for a set of Nodes with the same labels and the same properties that define uniqueness.

add_node(properties)

Create a node in this NodeSet.

Parameters:properties (dict) – Node properties.
add_unique(properties)

Add a node to this NodeSet only if a node with the same merge_keys does not exist yet.

Note: Right now this function iterates all nodes in the NodeSet. This is of course slow for large numbers of nodes. A better solution would be to create an ‘index’ as is done for RelationshipSet.

Parameters:properties (dict) – Node properties.
all_property_keys() → Set[str]

Return a set of all property keys in this NodeSet

Returns:A set of unique property keys of a NodeSet
create(graph, database: str = None, batch_size=None)

Create all nodes from NodeSet.

create_csv_query(filename: str = None, periodic_commit=1000)

Create a Cypher query to load a CSV file created with NodeSet.to_csv() into Neo4j (CREATE statement).

Parameters:
  • filename – Optional filename. A filename will be autocreated if not passed.
  • periodic_commit – Number of rows to commit in one transaction.
Returns:

Cypher query.

create_index(graph, database=None)

Create indices for all label/merge ky combinations as well as a composite index if multiple merge keys exist.

classmethod from_csv_json_set(csv_file_path, json_file_path, load_items: bool = False, labels_key: str = None, mergekey_key: str = None)

Read the default CSV/JSON file combination. Needs paths to CSV and JSON file.

JSON keys can be overwritten by passing the respective parameters.

Parameters:
  • csv_file_path – Path to the CSV file.
  • json_file_path – Path to the JSON file.
  • load_items – Yield items from file (False, default) or load them to memory (True).
Returns:

The NodeSet.

merge(graph, merge_properties=None, batch_size=None, preserve=None, append_props=None, database=None)

Merge nodes from NodeSet on merge properties.

Parameters:merge_properties – The merge properties.
merge_csv_query(filename: str = None, periodic_commit=1000)

Create a Cypher query to load a CSV file created with NodeSet.to_csv() into Neo4j (MERGE statement).

Parameters:
  • filename – Optional filename. A filename will be autocreated if not passed.
  • periodic_commit – Number of rows to commit in one transaction.
Returns:

Cypher query.

node_properties()

Yield properties of the nodes in this set. Used for create function.

object_file_name(suffix: str = None) → str

Create a unique name for this NodeSet that indicates content. Pass an optional suffix. NOTE: suffix has to include the ‘.’ for a filename!

nodeset_Label_merge-key_uuid

With suffix:

nodeset_Label_merge-key_uuid.json
to_csv(filepath: str, quoting: int = None) → str

Create a CSV file for this nodeset. Header row is created with all properties. Each row contains the properties of a node.

Example:

>>> nodeset = NodeSet(labels=["Person"], merge_keys=["name"])
>>> nodeset.add_node({"name": "Alice", "age": 33})
>>> nodeset.add_node({"name": "Bob", "age": 44})
>>> nodeset.to_csv("/tmp/Person_name.csv")
'/tmp/Person_name.csv'

name,age Alice,33 Bob,44

Parameters:
  • filepath – Full path to the CSV file.
  • quoting – Optional quoting setting for csv writer (any of csv.QUOTE_MINIMAL, csv.QUOTE_NONE, csv.QUOTE_ALL etc).
to_csv_json_set(csv_file_path, json_file_path, type_conversion: dict = None)

Write the default CSV/JSON file combination.

Needs paths to CSV and JSON file.

Parameters:
  • csv_file_path – Path to the CSV file.
  • json_file_path – Path to the JSON file.
  • type_conversion – Optional dictionary to convert types of properties.
to_definition()

Create a NodeSetDefinition from this NodeSet. Later, NodeSetDefinition can become parent class of NodeSet.

to_dict()

Create dictionary defining the nodeset.

to_json(target_dir: str, filename: str = None)

Serialize NodeSet to a JSON file in a target directory.

This function is meant for dumping/reloading and not to create a general transport format. The function will likely be optimized for disk space or compressed in future.

update_node(properties: dict)

Update an existing node by overwriting all properties.

Note that this requires NodeSet(…, indexed=True) which is not the default!

Parameters:properties – Node property dictionary.

RelationshipSet

class graphio.RelationshipSet(rel_type, start_node_labels, end_node_labels, start_node_properties, end_node_properties, batch_size=None, default_props=None, source=False)

Container for a set of Relationships with the same type of start and end nodes.

Parameters:
  • rel_type – Realtionship type.
  • start_node_labels – Labels of the start node.
  • end_node_labels – Labels of the end node.
  • start_node_properties – Property keys to identify the start node.
  • end_node_properties – Properties to identify the end node.
  • batch_size – Batch size for Neo4j operations.
add_relationship(start_node_properties: dict, end_node_properties: dict, properties: dict = None)

Add a relationship to this RelationshipSet.

Parameters:properties – Relationship properties.
all_property_keys() → Set[str]

Return a set of all property keys in this RelationshipSet

Returns:A set of unique property keys of a NodeSet
create(graph, database=None, batch_size=None)

Create relationships in this RelationshipSet

create_csv_query(query_type: str, filename: str = None, periodic_commit=1000) → str

Generate the CREATE CSV query for this RelationshipSet. The function tries to take care of type conversions.

Note: You can’t use arrays as properties for nodes/relationships when creating CSV files.

LOAD CSV WITH HEADERS FROM xyz AS line MATCH (a:Gene), (b:Protein) WHERE a.sid = line.a_sid AND b.sid = line.b_sid AND b.taxid = line.b_taxid CREATE (a)-[r:MAPS]->(b) SET r.key1 = line.rel_key1, r.key2 = line.rel_key2

create_index(graph, database=None)

Create indices for start node and end node definition of this relationshipset. If more than one start or end node property is defined, all single property indices as well as the composite index are created.

classmethod from_csv_json_set(csv_file_path, json_file_path, load_items: bool = False, reltype_key=None, startnodeproperties_key=None, endnodeproperties_key=None, startnodelables_key=None, endnodelables_key=None)

Read the default CSV/JSON file combination. Needs paths to CSV and JSON file.

JSON keys can be overwritten by passing the respective parameters.

Parameters:
  • csv_file_path – Path to the CSV file.
  • json_file_path – Path to the JSON file.
  • load_items – Yield items from file (False, default) or load them to memory (True).
Returns:

The RelationshipSet.

merge(graph, database=None, batch_size=None)

Create relationships in this RelationshipSet

object_file_name(suffix: str = None) → str

Create a unique name for this RelationshipSet that indicates content. Pass an optional suffix. NOTE: suffix has to include the ‘.’ for a filename!

relationshipset_StartLabel_TYPE_EndLabel_uuid

With suffix:

relationshipset_StartLabel_TYPE_EndLabel_uuid.json
to_csv(filepath: str, quoting: int = None) → str

Write the RelationshipSet to a CSV file. The CSV file will be written to the given filepath.

Note: You can’t use arrays as properties for nodes/relationships when creating CSV files.

# CSV file header start_sid, end_sid, end_taxid, rel_key1, rel_key2

Parameters:
to_csv_json_set(csv_file_path, json_file_path, write_mode: str = 'w')

Write the default CSV/JSON file combination.

Needs paths to CSV and JSON file.

Parameters:
  • csv_file_path – Path to the CSV file.
  • json_file_path – Path to the JSON file.
  • write_mode – Write mode for the CSV file.
to_json(target_dir, filename: str = None)

Serialize NodeSet to a JSON file in a target directory.

This function is meant for dumping/reloading and not to create a general transport format. The function will likely be optimized for disk space or compressed in future.

Container

class graphio.Container(objects=None)

A container for a collection of Nodes, Relationships, NodeSets and RelationshipSets.

A typical parser function to e.g. read an Excel file produces a mixed output which then has to be processed accordingly.

Also, sanity checks and data statistics are useful.

merge_nodesets()

Merge all node sets if merge_key is defined.

nodesets

Get the NodeSets in the Container.

relationshipsets

Get the RelationshipSets in the Container.