Discovery Queries: Exploring an Unknown Schema
When you inherit a RushDB project — from a teammate, a data import, or a legacy integration — your immediate problem is: what is in here and how is it connected?
This tutorial walks a systematic exploration workflow that works in code, in the MCP server, and in the REST API. It teaches how to trust your findings and build reliable queries from scratch without guessing at label names or property keys.
Why schema discovery matters
RushDB does not enforce a fixed schema. Labels and properties are inferred from what you insert. That means:
- Labels can vary in casing (
Article,ARTICLE,article) depending on who wrote the ingest code - Properties may be absent on some records
- Relationship types may be present in one direction but not the other
- You cannot assume anything from type names alone — verify before querying
The ontology tools give you a live view of what exists.
Step 1: Get the ontology
The ontology describes every label and property that exists in your project, including property types and relationship paths.
- Python
- TypeScript
- shell
from rushdb import RushDB
import os, json
db = RushDB(os.environ["RUSHDB_API_KEY"], base_url="https://api.rushdb.com/api/v1")
ontology = db.ai.get_ontology()
print(json.dumps(ontology, indent=2))
markdown = db.ai.get_ontology_markdown()
print(markdown)
import RushDB from '@rushdb/javascript-sdk'
const db = new RushDB(process.env.RUSHDB_API_KEY!)
// Returns labels, properties, and their types in structured JSON
const ontology = await db.ai.getOntology()
console.log(JSON.stringify(ontology, null, 2))
// Returns the same data formatted as compact Markdown — useful for pasting to an LLM
const markdown = await db.ai.getOntologyMarkdown()
console.log(markdown)
BASE="https://api.rushdb.com/api/v1"
TOKEN="RUSHDB_API_KEY"
H='Content-Type: application/json'
# JSON ontology
curl -s -X POST "$BASE/ai/ontology" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{}'
# Markdown ontology
curl -s -X POST "$BASE/ai/ontology/md" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{}'
Read the output carefully before writing any queries. Note every label name exactly as it appears — casing is significant.
Step 2: Sample records from each label
After identifying labels, pull a small sample from each to see real data shapes.
- Python
- TypeScript
- shell
labels = ["CUSTOMER", "ORDER", "PRODUCT"] # exact names from ontology
for label in labels:
sample = db.records.find({
"labels": [label],
"limit": 3
})
print(f"\n=== {label} ({sample.total} total) ===")
for r in sample.data:
print(r.data)
// Sample 3 records from each label you found
const labels = ['CUSTOMER', 'ORDER', 'PRODUCT'] // exact names from ontology
for (const label of labels) {
const sample = await db.records.find({
labels: [label],
limit: 3
})
console.log(`\n=== ${label} (${sample.total} total) ===`)
console.log(JSON.stringify(sample.data, null, 2))
}
for LABEL in CUSTOMER ORDER PRODUCT; do
echo "=== $LABEL ==="
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d "{\"labels\":[\"$LABEL\"],\"limit\":3}"
done
Step 3: Discover distinct values for key properties
Before filtering, check what values actually exist for categorical fields. Use select + groupBy to enumerate distinct values.
- Python
- TypeScript
- shell
status_counts = db.records.find({
"labels": ["ORDER"],
"select": {
"count": {"$count": "*"},
"status": "$record.status"
},
"groupBy": ["status", "count"],
"orderBy": {"count": "desc"}
})
print("ORDER statuses:")
for row in status_counts.data:
print(f" {row.data.get('status')}: {row.data.get('count')}")
// Find all distinct statuses on ORDER records
const statusCounts = await db.records.find({
labels: ['ORDER'],
select: {
count: { $count: '*' },
status: '$record.status'
},
groupBy: ['status', 'count'],
orderBy: { count: 'desc' }
})
console.log('ORDER statuses:')
for (const row of statusCounts.data) {
console.log(` ${row.status}: ${row.count}`)
}
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{
"labels": ["ORDER"],
"select": {
"count": {"$count": "*"},
"status": "$record.status"
},
"groupBy": ["status", "count"],
"orderBy": {"count": "desc"}
}'
Do this for every categorical field you encounter. This prevents you from querying status: 'active' when the actual values are ACTIVE, Active, or 1.
Step 4: Verify a relationship exists before traversing it
Ontology shows you that a relationship path exists. Before building a traversal query, confirm direction and label names by searching for records that have a specific relationship.
- Python
- TypeScript
- shell
customers = db.records.find({
"labels": ["CUSTOMER"],
"where": {
"ORDER": {
"$relation": {"type": "PLACED_ORDER", "direction": "out"}
}
},
"limit": 1
})
if not customers.data:
reverse = db.records.find({
"labels": ["CUSTOMER"],
"where": {
"ORDER": {
"$relation": {"type": "PLACED_ORDER", "direction": "in"}
}
},
"limit": 1
})
print("Reverse direction hit:", len(reverse.data) > 0)
// Test: do CUSTOMER records have outbound PLACED_ORDER relationships to ORDER?
const customers = await db.records.find({
labels: ['CUSTOMER'],
where: {
ORDER: {
$relation: { type: 'PLACED_ORDER', direction: 'out' }
}
},
limit: 1
})
if (customers.data.length === 0) {
// Try reverse direction or different type name
const reverse = await db.records.find({
labels: ['CUSTOMER'],
where: {
ORDER: {
$relation: { type: 'PLACED_ORDER', direction: 'in' }
}
},
limit: 1
})
console.log('Reverse direction hit:', reverse.data.length > 0)
}
# Test outbound direction
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{
"labels": ["CUSTOMER"],
"where": {
"ORDER": {
"$relation": {"type": "PLACED_ORDER", "direction": "out"}
}
},
"limit": 1
}'
Step 5: Identify nullable properties with $exists
When a property appears in some records but not others, use $exists: false to find records missing it and $exists: true to confirm presence before filtering.
- Python
- TypeScript
- shell
missing_email = db.records.find({
"labels": ["CUSTOMER"],
"where": {"email": {"$exists": False}},
"select": {"count": {"$count": "*"}},
"groupBy": ["count"]
})
has_email = db.records.find({
"labels": ["CUSTOMER"],
"where": {"email": {"$exists": True}},
"select": {"count": {"$count": "*"}},
"groupBy": ["count"]
})
print(f"Missing email: {missing_email.data[0]['count'] if missing_email.data else 0}")
print(f"Has email: {has_email.data[0]['count'] if has_email.data else 0}")
// How many CUSTOMER records are missing an email property?
const missingEmail = await db.records.find({
labels: ['CUSTOMER'],
where: { email: { $exists: false } },
select: { count: { $count: '*' } },
groupBy: ['count']
})
// How many have email?
const hasEmail = await db.records.find({
labels: ['CUSTOMER'],
where: { email: { $exists: true } },
select: { count: { $count: '*' } },
groupBy: ['count']
})
console.log(`Missing email: ${missingEmail.data[0]?.count ?? 0}`)
console.log(`Has email: ${hasEmail.data[0]?.count ?? 0}`)
# Missing email
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{"labels":["CUSTOMER"],"where":{"email":{"$exists":false}},"select":{"count":{"$count":"*"}},"groupBy":["count"]}'
# Has email
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{"labels":["CUSTOMER"],"where":{"email":{"$exists":true}},"select":{"count":{"$count":"*"}},"groupBy":["count"]}'
Step 6: Build the reliable query bottom-up
After the above steps you know:
- exact label names and their counts
- real property names and value shapes
- which categorical values are in use
- which properties may be absent
- relationship type names and their directions
Now build your query incrementally, adding one filter at a time and verifying results at each step.
- Python
- TypeScript
- shell
# Incremental query building
base = db.records.find({"labels": ["ORDER"], "limit": 1})
print("Base query:", base.total)
with_status = db.records.find({
"labels": ["ORDER"],
"where": {"status": "shipped"},
"limit": 1
})
print("With status filter:", with_status.total)
with_customer = db.records.find({
"labels": ["ORDER"],
"where": {
"status": "shipped",
"CUSTOMER": {
"$relation": {"type": "PLACED_ORDER", "direction": "in"},
"region": "EU"
}
},
"limit": 1
})
print("With customer traversal:", with_customer.total)
// Step 1: verify base label works
const base = await db.records.find({ labels: ['ORDER'], limit: 1 })
console.log('Base query:', base.total)
// Step 2: add first filter
const withStatus = await db.records.find({
labels: ['ORDER'],
where: { status: 'shipped' }, // confirmed from Step 3
limit: 1
})
console.log('With status filter:', withStatus.total)
// Step 3: add traversal
const withCustomer = await db.records.find({
labels: ['ORDER'],
where: {
status: 'shipped',
CUSTOMER: {
$relation: { type: 'PLACED_ORDER', direction: 'in' },
region: 'EU'
}
},
limit: 1
})
console.log('With customer traversal:', withCustomer.total)
// Step 4: switch to production query
const result = await db.records.find({
labels: ['ORDER'],
where: {
status: { $in: ['shipped', 'delivered'] },
CUSTOMER: {
$relation: { type: 'PLACED_ORDER', direction: 'in' },
region: 'EU'
}
},
orderBy: { createdAt: 'desc' },
limit: 50
})
# Incremental: base
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{"labels":["ORDER"],"limit":1}'
# Add status filter
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{"labels":["ORDER"],"where":{"status":"shipped"},"limit":1}'
# Add traversal
curl -s -X POST "$BASE/records/search" \
-H "$H" -H "Authorization: Bearer $TOKEN" \
-d '{"labels":["ORDER"],"where":{"status":"shipped","CUSTOMER":{"$relation":{"type":"PLACED_ORDER","direction":"in"},"region":"EU"}},"limit":1}'
MCP server workflow equivalent
If you are using the MCP server with Claude or Cursor, follow the same pattern via tool calls:
- Call
getOntologyMarkdown— read and understand the output before anything else - Call
getSearchQuerySpec— load the canonical query structure for the LLM's context - Use
findRecordswithlimit: 3for sampling, notlimit: 1000 - Use
select+groupByqueries to enumerate categorical values - Add traversal filters one at a time, verifying counts match expectations
The getQueryBuilderPrompt tool returns a system prompt you can paste into any LLM session to enforce the ontology-first discipline described in this tutorial.
Production caveat
Ontology output reflects the current state of the database. In a live system it changes over time. If you are building an automated pipeline that depends on a specific label or property, add an existence check at startup and fail fast if the expected ontology elements are absent. Discovery workflows are interactive; production code should be explicit.
Next steps
- Agent-Safe Query Planning with Ontology First — automate the discovery loop in an agent
- MCP Quickstart for Real Operators — the same workflow via the MCP server
- Thinking in Graphs — mental model for working with connected data