Main Classes¶
NodeSet¶
-
class
graphio.
NodeSet
(labels=None, merge_keys=None, batch_size=None, default_props=None, preserve=None, append_props=None, indexed=False, additional_labels: List[str] = None, source: bool = False)¶ Container for a set of Nodes with the same labels and the same properties that define uniqueness.
-
add_node
(properties)¶ Create a node in this NodeSet.
Parameters: properties (dict) – Node properties.
-
add_unique
(properties)¶ Add a node to this NodeSet only if a node with the same merge_keys does not exist yet.
Note: Right now this function iterates all nodes in the NodeSet. This is of course slow for large numbers of nodes. A better solution would be to create an ‘index’ as is done for RelationshipSet.
Parameters: properties (dict) – Node properties.
-
all_property_keys
() → Set[str]¶ Return a set of all property keys in this NodeSet
Returns: A set of unique property keys of a NodeSet
-
create
(graph, database: str = None, batch_size=None)¶ Create all nodes from NodeSet.
-
create_csv_query
(filename: str = None, periodic_commit=1000)¶ Create a Cypher query to load a CSV file created with NodeSet.to_csv() into Neo4j (CREATE statement).
Parameters: - filename – Optional filename. A filename will be autocreated if not passed.
- periodic_commit – Number of rows to commit in one transaction.
Returns: Cypher query.
-
create_index
(graph, database=None)¶ Create indices for all label/merge ky combinations as well as a composite index if multiple merge keys exist.
-
classmethod
from_csv_json_set
(csv_file_path, json_file_path, load_items: bool = False, labels_key: str = None, mergekey_key: str = None)¶ Read the default CSV/JSON file combination. Needs paths to CSV and JSON file.
JSON keys can be overwritten by passing the respective parameters.
Parameters: - csv_file_path – Path to the CSV file.
- json_file_path – Path to the JSON file.
- load_items – Yield items from file (False, default) or load them to memory (True).
Returns: The NodeSet.
-
merge
(graph, merge_properties=None, batch_size=None, preserve=None, append_props=None, database=None)¶ Merge nodes from NodeSet on merge properties.
Parameters: merge_properties – The merge properties.
-
merge_csv_query
(filename: str = None, periodic_commit=1000)¶ Create a Cypher query to load a CSV file created with NodeSet.to_csv() into Neo4j (MERGE statement).
Parameters: - filename – Optional filename. A filename will be autocreated if not passed.
- periodic_commit – Number of rows to commit in one transaction.
Returns: Cypher query.
-
node_properties
()¶ Yield properties of the nodes in this set. Used for create function.
-
object_file_name
(suffix: str = None) → str¶ Create a unique name for this NodeSet that indicates content. Pass an optional suffix. NOTE: suffix has to include the ‘.’ for a filename!
nodeset_Label_merge-key_uuidWith suffix:
nodeset_Label_merge-key_uuid.json
-
to_csv
(filepath: str, quoting: int = None) → str¶ Create a CSV file for this nodeset. Header row is created with all properties. Each row contains the properties of a node.
Example:
>>> nodeset = NodeSet(labels=["Person"], merge_keys=["name"]) >>> nodeset.add_node({"name": "Alice", "age": 33}) >>> nodeset.add_node({"name": "Bob", "age": 44}) >>> nodeset.to_csv("/tmp/Person_name.csv") '/tmp/Person_name.csv'
name,age Alice,33 Bob,44
Parameters: - filepath – Full path to the CSV file.
- quoting – Optional quoting setting for csv writer (any of csv.QUOTE_MINIMAL, csv.QUOTE_NONE, csv.QUOTE_ALL etc).
-
to_csv_json_set
(csv_file_path, json_file_path, type_conversion: dict = None)¶ Write the default CSV/JSON file combination.
Needs paths to CSV and JSON file.
Parameters: - csv_file_path – Path to the CSV file.
- json_file_path – Path to the JSON file.
- type_conversion – Optional dictionary to convert types of properties.
-
to_definition
()¶ Create a NodeSetDefinition from this NodeSet. Later, NodeSetDefinition can become parent class of NodeSet.
-
to_dict
()¶ Create dictionary defining the nodeset.
-
to_json
(target_dir: str, filename: str = None)¶ Serialize NodeSet to a JSON file in a target directory.
This function is meant for dumping/reloading and not to create a general transport format. The function will likely be optimized for disk space or compressed in future.
-
update_node
(properties: dict)¶ Update an existing node by overwriting all properties.
Note that this requires NodeSet(…, indexed=True) which is not the default!
Parameters: properties – Node property dictionary.
-
RelationshipSet¶
-
class
graphio.
RelationshipSet
(rel_type, start_node_labels, end_node_labels, start_node_properties, end_node_properties, batch_size=None, default_props=None, source=False)¶ Container for a set of Relationships with the same type of start and end nodes.
Parameters: - rel_type – Realtionship type.
- start_node_labels – Labels of the start node.
- end_node_labels – Labels of the end node.
- start_node_properties – Property keys to identify the start node.
- end_node_properties – Properties to identify the end node.
- batch_size – Batch size for Neo4j operations.
-
add_relationship
(start_node_properties: dict, end_node_properties: dict, properties: dict = None)¶ Add a relationship to this RelationshipSet.
Parameters: properties – Relationship properties.
-
all_property_keys
() → Set[str]¶ Return a set of all property keys in this RelationshipSet
Returns: A set of unique property keys of a NodeSet
-
create
(graph, database=None, batch_size=None)¶ Create relationships in this RelationshipSet
-
create_csv_query
(query_type: str, filename: str = None, periodic_commit=1000) → str¶ Generate the CREATE CSV query for this RelationshipSet. The function tries to take care of type conversions.
Note: You can’t use arrays as properties for nodes/relationships when creating CSV files.
LOAD CSV WITH HEADERS FROM xyz AS line MATCH (a:Gene), (b:Protein) WHERE a.sid = line.a_sid AND b.sid = line.b_sid AND b.taxid = line.b_taxid CREATE (a)-[r:MAPS]->(b) SET r.key1 = line.rel_key1, r.key2 = line.rel_key2
-
create_index
(graph, database=None)¶ Create indices for start node and end node definition of this relationshipset. If more than one start or end node property is defined, all single property indices as well as the composite index are created.
-
classmethod
from_csv_json_set
(csv_file_path, json_file_path, load_items: bool = False, reltype_key=None, startnodeproperties_key=None, endnodeproperties_key=None, startnodelables_key=None, endnodelables_key=None)¶ Read the default CSV/JSON file combination. Needs paths to CSV and JSON file.
JSON keys can be overwritten by passing the respective parameters.
Parameters: - csv_file_path – Path to the CSV file.
- json_file_path – Path to the JSON file.
- load_items – Yield items from file (False, default) or load them to memory (True).
Returns: The RelationshipSet.
-
merge
(graph, database=None, batch_size=None)¶ Create relationships in this RelationshipSet
-
object_file_name
(suffix: str = None) → str¶ Create a unique name for this RelationshipSet that indicates content. Pass an optional suffix. NOTE: suffix has to include the ‘.’ for a filename!
relationshipset_StartLabel_TYPE_EndLabel_uuidWith suffix:
relationshipset_StartLabel_TYPE_EndLabel_uuid.json
-
to_csv
(filepath: str, quoting: int = None) → str¶ Write the RelationshipSet to a CSV file. The CSV file will be written to the given filepath.
Note: You can’t use arrays as properties for nodes/relationships when creating CSV files.
# CSV file header start_sid, end_sid, end_taxid, rel_key1, rel_key2
Parameters: - filepath – Path to csv file.
- relset (graphio.RelationshipSet) – The RelationshipSet
-
to_csv_json_set
(csv_file_path, json_file_path, write_mode: str = 'w')¶ Write the default CSV/JSON file combination.
Needs paths to CSV and JSON file.
Parameters: - csv_file_path – Path to the CSV file.
- json_file_path – Path to the JSON file.
- write_mode – Write mode for the CSV file.
-
to_json
(target_dir, filename: str = None)¶ Serialize NodeSet to a JSON file in a target directory.
This function is meant for dumping/reloading and not to create a general transport format. The function will likely be optimized for disk space or compressed in future.
Container¶
-
class
graphio.
Container
(objects=None)¶ A container for a collection of Nodes, Relationships, NodeSets and RelationshipSets.
A typical parser function to e.g. read an Excel file produces a mixed output which then has to be processed accordingly.
Also, sanity checks and data statistics are useful.
-
merge_nodesets
()¶ Merge all node sets if merge_key is defined.
-
nodesets
¶ Get the NodeSets in the Container.
-
relationshipsets
¶ Get the RelationshipSets in the Container.
-