Skip to main content

Semantic Search in 5 Minutes

This tutorial walks you through the full semantic search workflow:

  1. Push some records
  2. Create an embedding index on a text property
  3. Poll until the index is ready
  4. Run semantic search
  5. Run semantic search with a filter

Prerequisites: a running RushDB instance with RUSHDB_EMBEDDING_MODEL configured (or RushDB Cloud with AI enabled).


Step 1: Push records

from rushdb import RushDB

db = RushDB("RUSHDB_API_KEY")

db.records.import_json({
"label": "Article",
"data": [
{
"title": "Intro to Machine Learning",
"description": "A beginner guide to supervised learning, neural networks, and model evaluation.",
"tags": ["ml", "beginner"]
},
{
"title": "Graph Databases Explained",
"description": "How graph databases store relationships and why they outperform SQL for connected data.",
"tags": ["databases", "graphs"]
},
{
"title": "Climate Science Overview",
"description": "Current research on global warming, carbon cycles, and renewable energy policy.",
"tags": ["science", "climate"]
}
]
})

Step 2: Create an embedding index

Tell RushDB to vectorize the description field on Article records.

response = db.ai.indexes.create({
"label": "Article",
"propertyName": "description"
})
index = response.data
print(index["id"], index["status"]) # e.g. 'idx_abc123', 'pending'

Attempting to create a duplicate (label, propertyName) pair returns 409 Conflict.


Step 3: Wait for the index to become ready

Backfill is asynchronous. Poll stats until indexedRecords === totalRecords.

import time

def wait_for_index(index_id: str, interval: float = 2.0):
while True:
stats = db.ai.indexes.stats(index_id).data
print(f"{stats['indexedRecords']} / {stats['totalRecords']} embedded")
if stats["indexedRecords"] >= stats["totalRecords"] > 0:
break
time.sleep(interval)

wait_for_index(index["id"])

RushDB always narrows candidates to the current project before ranking them by vector similarity.

response = db.ai.search({
"propertyName": "description",
"query": "neural networks and deep learning",
"labels": ["Article"],
"limit": 3
})

for result in response.data:
print(f"[{result.score:.3f}] {result['title']}")
# [0.921] Intro to Machine Learning
# [0.743] Graph Databases Explained
# [0.612] Climate Science Overview

Step 5: Semantic search with filter

Adding a where clause narrows the project-scoped candidate set further before cosine similarity ranking.

response = db.ai.search({
"propertyName": "description",
"query": "renewable energy and climate",
"labels": ["Article"],
"where": {
"tags": {"$in": ["science", "climate"]}
},
"limit": 5
})

for result in response.data:
print(f"[{result.score:.3f}] {result['title']}")

Next steps