Skip to main content

Query Optimization

RushDB executes every find() call as a Cypher query against Neo4j. The shape of your SearchQuery directly controls how much of the graph is traversed and how many graph operations are charged. This tutorial identifies the dominant cost patterns and shows how to reshape queries to reduce them.


Cost drivers

FactorImpact
Number of where traversal hopsMultiplied graph scan per additional hop
$collect in selectMaterializes all matched records into a list
Missing labels filterScans all record types
High limit without indexed field orderingFull sort on unindexed data
Per-record enrichment inside loopsO(n) queries for n results
$nin / $in with large arraysEvaluated against every candidate record

Tip 1: Always specify labels

Without labels, RushDB scans every record node in your project.

# ❌ Avoid
slow = db.records.find({"where": {"status": "active"}})

# ✅ Better
fast = db.records.find({"labels": ["USER"], "where": {"status": "active"}})

Tip 2: Filter at the deepest hop

Push the most selective filter as deep into the traversal as possible. This prunes the graph early and avoids materializing large intermediate sets.

# ✅ Push filter to the deepest hop
result = db.records.find({
"labels": ["CUSTOMER"],
"where": {
"status": "active",
"ORDER": {
"$relation": {"type": "PLACED", "direction": "out"},
"totalUsd": {"$gte": 500}
}
}
})

Tip 3: Limit traversal depth

Every additional hop multiplies the graph surface explored. Two hops is usually fine; three or more should be benchmarked at production volume.

orders = db.records.find({
"labels": ["ORDER"],
"where": {
"PRODUCT": {
"$relation": {"type": "CONTAINS", "direction": "out"},
"sku": "PROD-001"
}
}
})

order_ids = [o.id for o in orders.data]

teams = db.records.find({
"labels": ["TEAM"],
"where": {
"USER": {
"$relation": {"type": "MEMBER_OF", "direction": "in"},
"ORDER": {
"$relation": {"type": "PLACED", "direction": "out"},
"__id": {"$in": order_ids}
}
}
}
})

Tip 4: Prefer $count over $collect

$collect materializes every matched record into a list before returning. $count just increments a counter. Use $collect only when you need the actual records.

# ✅ Prefer $count over $collect unless records are needed
result = db.records.find({
"labels": ["ORDER"],
"where": {"status": "pending"},
"select": {"count": {"$count": "*"}}
})

Tip 5: Batch writes instead of serial create() loops

A loop calling create() once per record makes one HTTP round-trip per write. importJson sends all records in a single request.

records = [{"eventId": f"EVT-{i}", "type": "click", "ts": "2025-01-01T00:00:00Z"} for i in range(500)]

# ✅ Single call
db.records.import_json({"label": "EVENT", "data": records})

Tip 6: Cap limit in API endpoints

Never pass user-controlled limit values directly. Cap at a safe maximum.

MAX_LIMIT = 100

def build_query(params: dict) -> dict:
return {
"labels": ["ARTICLE"],
"limit": min(params.get("limit", 20), MAX_LIMIT),
"skip": params.get("skip", 0)
}

Quick-reference checklist

CheckWhy it matters
labels always setAvoids full graph scan
Most selective filter at deepest hopPrunes intermediate node sets early
Traversal depth ≤ 2 hopsThree+ hops need production benchmarking
$collect replaced with $count where possible$collect materializes all matched IDs
Bulk writes via importJsonOne round-trip instead of N
limit capped server-sidePrevents runaway result sets

Next steps