Skip to main content

Thinking in Graphs: From Tables to Traversals

Most developers arrive at a graph database carrying mental models built from SQL tables or JSON documents. Both are useful starting points. Neither maps directly to the query patterns that make graphs worth using.

This tutorial uses a realistic e-commerce scenario — customers, orders, products, and reviews — to show the same data across three mental models:

  1. Relational (normalized tables)
  2. Document (nested JSON)
  3. Graph (labeled nodes + typed relationships)

Then it translates five common business questions into RushDB queries so you can see exactly where the graph model pays off.


The scenario

A minimal e-commerce platform has:

  • Customers who place orders
  • Orders that contain line items referencing products
  • Products that belong to categories
  • Customers who write reviews for products

Three mental models for the same data

Relational model

customers(id, name, email)
products(id, name, category_id, price)
categories(id, name)
orders(id, customer_id, placed_at, status)
order_items(order_id, product_id, quantity, unit_price)
reviews(id, customer_id, product_id, rating, body)

In SQL you answer "which customers reviewed a product they never ordered?" with a NOT EXISTS subquery across three joins. The query is correct but the intent is buried in JOIN columns.

Document model

{
"customerId": "c1",
"name": "Lena Müller",
"orders": [
{
"orderId": "o1",
"status": "shipped",
"items": [
{ "productId": "p1", "qty": 2 }
]
}
],
"reviews": [
{ "productId": "p2", "rating": 5 }
]
}

Documents are fast for loading a single customer's full history. The problem appears at the edges: "which products from the same category did this customer's network also buy?" requires post-processing across multiple documents.

Graph model

The same data becomes labeled nodes connected by typed, directed relationships:

The graph stores relationships as first-class data, which is why multi-hop questions become natural instead of awkward.


Ingesting the dataset

from rushdb import RushDB

db = RushDB("RUSHDB_API_KEY", base_url="https://api.rushdb.com/api/v1")

# Categories
photography = db.records.create("CATEGORY", {"name": "Photography"})
audio = db.records.create("CATEGORY", {"name": "Audio"})

# Products
lens_cap = db.records.create("PRODUCT", {"name": "Lens Cap 58mm", "price": 12.99})
camera_bag = db.records.create("PRODUCT", {"name": "Camera Bag Pro", "price": 89.00})
headphones = db.records.create("PRODUCT", {"name": "Studio Headphones", "price": 149.00})

# Link products to categories
db.records.attach(lens_cap.id, photography.id, {"type": "IN_CATEGORY"})
db.records.attach(camera_bag.id, photography.id, {"type": "IN_CATEGORY"})
db.records.attach(headphones.id, audio.id, {"type": "IN_CATEGORY"})

# Customers
lena = db.records.create("CUSTOMER", {"name": "Lena Müller", "email": "lena@example.com"})
marco = db.records.create("CUSTOMER", {"name": "Marco Rossi", "email": "marco@example.com"})

# Orders
order1 = db.records.create("ORDER", {"status": "shipped", "placedAt": "2025-01-10"})
db.records.attach(lena.id, order1.id, {"type": "PLACED"})
db.records.attach(order1.id, lens_cap.id, {"type": "CONTAINS"})

order2 = db.records.create("ORDER", {"status": "delivered", "placedAt": "2025-02-14"})
db.records.attach(marco.id, order2.id, {"type": "PLACED"})
db.records.attach(order2.id, camera_bag.id, {"type": "CONTAINS"})

# Reviews
review1 = db.records.create("REVIEW", {"rating": 5, "body": "Perfect fit."})
db.records.attach(lena.id, review1.id, {"type": "WROTE_REVIEW"})
db.records.attach(review1.id, camera_bag.id, {"type": "ABOUT"})

Five business questions, translated

Q1: Which orders has a given customer placed?

Relational intuition: SELECT * FROM orders WHERE customer_id = 'c1'

Graph query:

results = db.records.find({
"labels": ["ORDER"],
"where": {
"CUSTOMER": {
"$alias": "$customer",
"$relation": {"type": "PLACED", "direction": "in"},
"email": "lena@example.com"
}
},
"orderBy": {"placedAt": "desc"}
})

Q2: Which products did a customer purchase, grouped by category?

Three hops: CUSTOMER → ORDER → PRODUCT → CATEGORY.

results = db.records.find({
"labels": ["PRODUCT"],
"where": {
"ORDER": {
"$alias": "$order",
"$relation": {"type": "CONTAINS", "direction": "in"},
"CUSTOMER": {
"email": "lena@example.com"
}
},
"CATEGORY": {
"$alias": "$cat"
}
},
"select": {
"productName": "$record.name",
"price": "$record.price",
"categoryName": "$cat.name",
"orderedAt": "$order.placedAt"
}
})

Q3: Which products received 5-star reviews but have not been ordered yet?

# Products that appear in orders
ordered = db.records.find({
"labels": ["PRODUCT"],
"where": {
"ORDER": {"$relation": {"type": "CONTAINS", "direction": "in"}}
},
"select": {"id": "$record.__id"}
})
ordered_ids = [r["id"] for r in ordered.data]

# 5-star reviewed products NOT in ordered set
unordered_high = db.records.find({
"labels": ["PRODUCT"],
"where": {
"__id": {"$nin": ordered_ids},
"REVIEW": {
"$relation": {"type": "ABOUT", "direction": "in"},
"rating": 5
}
}
})

Q4: How many orders per customer, with average order recency?

summary = db.records.find({
"labels": ["CUSTOMER"],
"where": {
"ORDER": {
"$alias": "$order",
"$relation": {"type": "PLACED", "direction": "out"}
}
},
"select": {
"customerName": "$record.name",
"orderCount": {"$count": "$order"},
"lastOrderDate": {"$max": "$order.placedAt"}
},
"groupBy": ["customerName", "orderCount", "lastOrderDate"],
"orderBy": {"orderCount": "desc"}
})

Q5: Which products in the Photography category have been both ordered and reviewed?

results = db.records.find({
"labels": ["PRODUCT"],
"where": {
"CATEGORY": {"name": "Photography"},
"ORDER": {"$relation": {"type": "CONTAINS", "direction": "in"}},
"REVIEW": {
"$alias": "$review",
"$relation": {"type": "ABOUT", "direction": "in"}
}
},
"select": {
"productName": "$record.name",
"reviewCount": {"$count": "$review"},
"avgRating": {"$avg": "$review.rating", "$precision": 1}
},
"groupBy": ["productName", "reviewCount", "avgRating"],
"orderBy": {"avgRating": "desc"}
})

What changed between the mental models

ConcernRelationalDocumentGraph
Multi-hop traversalMulti-join SQLIteration across documentsSingle query with nested where
Adding a new relationship typeNew foreign key column or join tableSchema migration or array appendNew attach call, zero schema changes
Querying along a new pathRewrite query or add indexRewrite aggregation logicExtend existing where block
Metrics along pathGROUP BY with joinsMap-reducePer-hop metrics in same query

Production caveat

Relationship traversal queries become expensive when each hop fans out to thousands of related records. Before deploying traversal-heavy queries in production, scope them aggressively with limit on the leaf label and property filters that eliminate most candidates early. The SearchQuery Deep Dive tutorial covers aggregation and traversal optimization in more detail.


Next steps