Query Optimization
RushDB executes every find() call as a Cypher query against Neo4j. The shape of your SearchQuery directly controls how much of the graph is traversed and how many graph operations are charged. This tutorial identifies the dominant cost patterns and shows how to reshape queries to reduce them.
Cost drivers
| Factor | Impact |
|---|---|
Number of where traversal hops | Multiplied graph scan per additional hop |
$collect in select | Materializes all matched records into a list |
Missing labels filter | Scans all record types |
High limit without indexed field ordering | Full sort on unindexed data |
| Per-record enrichment inside loops | O(n) queries for n results |
$nin / $in with large arrays | Evaluated against every candidate record |
Tip 1: Always specify labels
Without labels, RushDB scans every record node in your project.
- Python
- TypeScript
- shell
# ❌ Avoid
slow = db.records.find({"where": {"status": "active"}})
# ✅ Better
fast = db.records.find({"labels": ["USER"], "where": {"status": "active"}})
// ❌ Avoid — scans all labels
const slow = await db.records.find({
where: { status: 'active' }
})
// ✅ Better — restricts scan to one label
const fast = await db.records.find({
labels: ['USER'],
where: { status: 'active' }
})
# ✅ Always pass labels
curl -s -X POST "$BASE/records/search" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"labels":["USER"],"where":{"status":"active"}}'
Tip 2: Filter at the deepest hop
Push the most selective filter as deep into the traversal as possible. This prunes the graph early and avoids materializing large intermediate sets.
- Python
- TypeScript
# ✅ Push filter to the deepest hop
result = db.records.find({
"labels": ["CUSTOMER"],
"where": {
"status": "active",
"ORDER": {
"$relation": {"type": "PLACED", "direction": "out"},
"totalUsd": {"$gte": 500}
}
}
})
// ❌ Broad: fetches all orders, then filters by status at outer level
const broad = await db.records.find({
labels: ['CUSTOMER'],
where: {
status: 'active', // customer filter
ORDER: { $relation: { type: 'PLACED', direction: 'out' } }
// No filter on ORDER itself — all orders are traversed
}
})
// ✅ Narrow: prune at the ORDER hop with totalUsd filter
const narrow = await db.records.find({
labels: ['CUSTOMER'],
where: {
status: 'active',
ORDER: {
$relation: { type: 'PLACED', direction: 'out' },
totalUsd: { $gte: 500 } // filter at the ORDER hop
}
}
})
Tip 3: Limit traversal depth
Every additional hop multiplies the graph surface explored. Two hops is usually fine; three or more should be benchmarked at production volume.
- Python
- TypeScript
orders = db.records.find({
"labels": ["ORDER"],
"where": {
"PRODUCT": {
"$relation": {"type": "CONTAINS", "direction": "out"},
"sku": "PROD-001"
}
}
})
order_ids = [o.id for o in orders.data]
teams = db.records.find({
"labels": ["TEAM"],
"where": {
"USER": {
"$relation": {"type": "MEMBER_OF", "direction": "in"},
"ORDER": {
"$relation": {"type": "PLACED", "direction": "out"},
"__id": {"$in": order_ids}
}
}
}
})
// Three-hop traversal — verify this performs at your data volume
const threeHop = await db.records.find({
labels: ['TEAM'],
where: {
USER: {
$relation: { type: 'MEMBER_OF', direction: 'in' },
ORDER: {
$relation: { type: 'PLACED', direction: 'out' },
PRODUCT: {
$relation: { type: 'CONTAINS', direction: 'out' },
sku: 'PROD-001'
}
}
}
}
})
// ✅ Equivalent two-step approach: first resolve products, then query teams
const orders = await db.records.find({
labels: ['ORDER'],
where: {
PRODUCT: {
$relation: { type: 'CONTAINS', direction: 'out' },
sku: 'PROD-001'
}
}
})
const orderIds = orders.data.map(o => o.__id)
const teams = await db.records.find({
labels: ['TEAM'],
where: {
USER: {
$relation: { type: 'MEMBER_OF', direction: 'in' },
ORDER: {
$relation: { type: 'PLACED', direction: 'out' },
__id: { $in: orderIds }
}
}
}
})
Tip 4: Prefer $count over $collect
$collect materializes every matched record into a list before returning. $count just increments a counter. Use $collect only when you need the actual records.
- Python
- TypeScript
# ✅ Prefer $count over $collect unless records are needed
result = db.records.find({
"labels": ["ORDER"],
"where": {"status": "pending"},
"select": {"count": {"$count": "*"}}
})
// ❌ Expensive: collects all record IDs just to check if any exist
const collected = await db.records.find({
labels: ['ORDER'],
where: { status: 'pending' },
select: { ids: { $collect: { from: '$record' } } }
})
// ✅ Cheap: count is all you need
const counted = await db.records.find({
labels: ['ORDER'],
where: { status: 'pending' },
select: { count: { $count: '*' } }
})
Tip 5: Batch writes instead of serial create() loops
A loop calling create() once per record makes one HTTP round-trip per write. importJson sends all records in a single request.
- Python
- TypeScript
- shell
records = [{"eventId": f"EVT-{i}", "type": "click", "ts": "2025-01-01T00:00:00Z"} for i in range(500)]
# ✅ Single call
db.records.import_json({"label": "EVENT", "data": records})
const records = Array.from({ length: 500 }, (_, i) => ({
eventId: `EVT-${i}`,
type: 'click',
ts: new Date().toISOString()
}))
// ❌ 500 round-trips
for (const r of records) {
await db.records.create({ label: 'EVENT', data: r })
}
// ✅ 1 round-trip
await db.records.importJson({ label: 'EVENT', data: records })
# Build the data array with jq then send in one call
PAYLOAD=$(jq -n '[range(500)] | map({"eventId": ("EVT-" + tostring), "type":"click"})')
curl -s -X POST "$BASE/records/import/json" \
-H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" \
-d "{\"label\":\"EVENT\",\"data\":$PAYLOAD}"
Tip 6: Cap limit in API endpoints
Never pass user-controlled limit values directly. Cap at a safe maximum.
- Python
- TypeScript
MAX_LIMIT = 100
def build_query(params: dict) -> dict:
return {
"labels": ["ARTICLE"],
"limit": min(params.get("limit", 20), MAX_LIMIT),
"skip": params.get("skip", 0)
}
const MAX_LIMIT = 100
function buildQuery(params: { limit?: number; skip?: number }): object {
return {
labels: ['ARTICLE'],
limit: Math.min(params.limit ?? 20, MAX_LIMIT),
skip: params.skip ?? 0
}
}
Quick-reference checklist
| Check | Why it matters |
|---|---|
labels always set | Avoids full graph scan |
| Most selective filter at deepest hop | Prunes intermediate node sets early |
| Traversal depth ≤ 2 hops | Three+ hops need production benchmarking |
$collect replaced with $count where possible | $collect materializes all matched IDs |
Bulk writes via importJson | One round-trip instead of N |
limit capped server-side | Prevents runaway result sets |
Next steps
- Testing SearchQuery — verify query correctness across TS, Python, and REST
- Discovery Queries — explore unknown schemas before committing to traversal patterns
- Hybrid Retrieval — structured filter + semantic ranking patterns