Getting Started: Bulk Loading Track¶
Best for: ETL processes, large datasets, data migration, and high-performance data ingestion.
Prerequisites¶
-
Neo4j Database: Running locally or remotely
-
Install Graphio:
Step 1: Set Up Connection¶
from graphio import NodeSet, RelationshipSet
from neo4j import GraphDatabase
# Connect to Neo4j
driver = GraphDatabase.driver('neo4j://localhost:7687', auth=('neo4j', 'password'))
# Optional: For Enterprise Edition, you can specify a database
# database = 'mydb' # All operations can target this specific database
Step 2: Define Data Containers¶
# Define node containers
people = NodeSet(['Person'], merge_keys=['email'])
companies = NodeSet(['Company'], merge_keys=['name'], deduplicate=True) # Prevent duplicate companies
# Define relationship container
employments = RelationshipSet(
'WORKS_AT', # Relationship type
['Person'], # Start node labels
['Company'], # End node labels
['email'], # Start node match keys
['name'] # End node match keys
)
Step 3: Add Data in Batches¶
# Add nodes (can handle thousands efficiently)
people.add({'name': 'Alice Smith', 'email': 'alice@example.com', 'age': 30})
people.add({'name': 'Bob Johnson', 'email': 'bob@example.com', 'age': 25})
# You can also specify OGM instances if using hybrid approach
# from your_models import Person
# people.add(Person(name='Alice', email='alice@example.com', age=30))
companies.add({'name': 'ACME Corp', 'industry': 'Technology'})
# Add relationships
employments.add(
{'email': 'alice@example.com'}, # Start node
{'name': 'ACME Corp'}, # End node
{'position': 'Developer'} # Relationship properties
)
Step 4: Create Indexes (Performance)¶
Step 5: Bulk Load to Neo4j¶
# Load data efficiently
companies.create(driver) # Load companies first
people.create(driver) # Then people
employments.create(driver) # Finally relationships
# For Enterprise Edition, specify target database:
# companies.create(driver, database='production')
# people.create(driver, database='production')
# employments.create(driver, database='production')
print(f"Loaded {len(people.nodes)} people and {len(companies.nodes)} companies")
What You've Learned¶
✅ How to create NodeSet and RelationshipSet containers
✅ How to batch data for efficient loading
✅ How to create indexes for performance
✅ Proper loading order (nodes before relationships)
✅ How to prevent duplicates with built-in deduplication
Next Steps¶
- Need data validation? → OGM Track
- Want to combine both? → Hybrid Approach
- Deep dive into bulk loading → Bulk Loading Guide