Skip to main content

Manage Embedding Indexes

An embedding index is a policy that tells RushDB to vectorize a specific string property for a label. Once status is ready, every record matching that label+property pair is searchable via db.ai.search().

Indexes are scoped to (label, propertyName)Book:description and Article:description are completely independent with separate vector stores.


How RushDB stores embeddings

Most databases store vectors directly on the record — polluting the schema with fields like description_emb_v1_1536 alongside regular metadata. Agents retrieving records get back a mix of business fields and raw float arrays, making schemas noisy and hard to reason about.

RushDB stores each vector on a dedicated Property node, connected to the Record via an internal edge. Your record metadata stays clean and uniform; the vector index lives separately and is only accessed when you explicitly call db.ai.search().

Decoupling vectors from record data means:

  • Agents see clean schemas — no float arrays mixed with business fields
  • Multiple indexes on one property — cosine + euclidean, different dimensions, side by side
  • Zero record writes on index create/delete — vector policy changes don't touch your data nodes

Create an Index

db.ai.indexes.create()

# Simplest form — uses server-configured model and dimensions
response = db.ai.indexes.create({
"label": "Article",
"propertyName": "description"
})
print(response.data["status"]) # 'pending' → backfill starts immediately

# With explicit parameters
response = db.ai.indexes.create({
"label": "Article",
"propertyName": "description",
"similarityFunction": "cosine",
"dimensions": 1536
})

Create parameters

ParameterTypeRequiredDescription
labelstringyesLabel to scope this index to (e.g. "Article")
propertyNamestringyesProperty to embed (e.g. "description")
sourceTypestringno"managed" (default) or "external". See Bring Your Own Vectors.
similarityFunctionstringno"cosine" (default) or "euclidean"
dimensionsnumbernoVector dimensionality. Defaults to server RUSHDB_EMBEDDING_DIMENSIONS. Required for external indexes.

Attempting to create a duplicate (label, propertyName, sourceType, similarityFunction, dimensions) tuple returns 409 Conflict.

Model config is server-side. The embedding model is set via RUSHDB_EMBEDDING_MODEL and RUSHDB_EMBEDDING_DIMENSIONS env vars.

Index lifecycle

StatusDescription
pendingPolicy created, waiting for backfill scheduler
indexingBackfill in progress
awaiting_vectorsExternal index — waiting for client to push vectors
readyAll existing records have vectors; search is available
errorBackfill failed; check server logs for the cause

List Indexes

db.ai.indexes.find()

response = db.ai.indexes.find()
for index in response.data:
print(f"{index['label']}.{index['propertyName']}{index['status']}")

Index Stats

Returns the fill rate for an index — useful for progress monitoring.

db.ai.indexes.stats(index_id)

response = db.ai.indexes.stats(index_id)
stats = response.data
print(f"{stats['indexedRecords']} / {stats['totalRecords']} records indexed")

Delete an Index

db.ai.indexes.delete(index_id)

db.ai.indexes.delete(index_id)

The underlying Neo4j DDL vector index is only dropped when zero embeddings remain across the entire project — this avoids unnecessary rebuilds when multiple policies share the same (dimensions, similarityFunction).


Index Response Shape

{
"id": "idx_abc123",
"projectId": "proj_xyz",
"label": "Article",
"propertyName": "description",
"modelKey": "text-embedding-3-small",
"sourceType": "managed",
"similarityFunction": "cosine",
"dimensions": 1536,
"vectorPropertyName": "_emb_managed_cosine_1536",
"enabled": true,
"status": "ready",
"createdAt": "2025-01-10T12:00:00.000Z",
"updatedAt": "2025-01-10T12:05:00.000Z"
}

Wait for Index Ready

For managed indexes, backfill runs asynchronously. Poll until status is ready:

import time

def wait_for_index_ready(db, index_id, timeout_s=90):
deadline = time.time() + timeout_s
while time.time() < deadline:
response = db.ai.indexes.find()
idx = next((i for i in response.data if i["id"] == index_id), None)
if idx and idx["status"] == "ready":
return
if idx and idx["status"] == "error":
raise RuntimeError("Index entered error state")
time.sleep(3)
raise TimeoutError("Index did not become ready in time")

response = db.ai.indexes.create({"label": "Book", "propertyName": "description"})
wait_for_index_ready(db, response.data["id"])
# now safe to call db.ai.search(...)

Multiple Indexes on the Same Property

You can have more than one index per (label, propertyName) pair, provided the signature differs:

# Cosine index
db.ai.indexes.create({
"label": "Product",
"propertyName": "description",
"similarityFunction": "cosine",
"dimensions": 768,
})

# Euclidean index on the same property
db.ai.indexes.create({
"label": "Product",
"propertyName": "description",
"similarityFunction": "euclidean",
"dimensions": 768,
})

When searching or writing vectors against a property with multiple indexes, specify similarityFunction to disambiguate.


Error Reference

HTTPCause
404Property does not exist in the project graph
409An index for this (label, propertyName, sourceType, similarityFunction, dimensions) tuple already exists
422Property is not string type
422Embedding model is not configured on the server

See also