Skip to main content

Embedding Indexes

An embedding index is a policy that tells RushDB to vectorize a specific string property for a label. Once status is ready, every record matching that label+property pair is searchable via db.ai.search().


How indexes work

Indexes are scoped to (label, propertyName). Book:description and Article:description are completely independent — they maintain separate vector stores and never interfere.

Index policy
label: "Book"
propertyName: "description"
sourceType: "managed"
dimensions: 1536
status: "ready"

↓ backfill runs automatically

Book records get vectors stored on their VALUE relationships:
rel._emb_managed_cosine_1536 = [0.1, 0.2, ...]

When new records are created or existing records are updated, the index transitions back to pending and vectors are recomputed on the next backfill cycle.


List Indexes

db.ai.indexes.find()

List all embedding index policies for the current project.

response = db.ai.indexes.find()
for index in response.data:
print(f"{index['label']}.{index['propertyName']}{index['status']}")

Example response data

[
{
"id": "idx_abc123",
"projectId": "proj_xyz",
"label": "Article",
"propertyName": "description",
"sourceType": "managed",
"similarityFunction": "cosine",
"modelKey": "text-embedding-3-small",
"dimensions": 1536,
"vectorPropertyName": "_emb_managed_cosine_1536",
"enabled": True,
"status": "ready",
"createdAt": "2025-01-10T12:00:00.000Z",
"updatedAt": "2025-01-10T12:05:00.000Z",
}
]

Create Index

db.ai.indexes.create()

Create a new embedding index policy for a string property.

db.ai.indexes.create(params: dict) -> ApiResponse[dict]
params keyTypeRequiredDescription
labelstringyesLabel to scope this index to (e.g. "Article")
propertyNamestringyesName of the property to embed (e.g. "description")
sourceTypestringno"managed" (default) or "external". See Advanced Indexing.
similarityFunctionstringno"cosine" (default) or "euclidean"
dimensionsnumbernoVector dimensionality. Defaults to server RUSHDB_EMBEDDING_DIMENSIONS. Required for external indexes.
# Simplest form — uses server-configured model and dimensions
response = db.ai.indexes.create({
"label": "Article",
"propertyName": "description"
})
print(response.data["status"]) # 'pending' → backfill starts immediately

# With explicit parameters
response = db.ai.indexes.create({
"label": "Article",
"propertyName": "description",
"similarityFunction": "cosine",
"dimensions": 1536
})

Attempting to create a duplicate (label, propertyName, sourceType, similarityFunction, dimensions) tuple returns 409 Conflict.

Index lifecycle

StatusDescription
pendingPolicy created, waiting for backfill scheduler
indexingBackfill in progress
awaiting_vectorsExternal index — waiting for client to push vectors
readyAll existing records have vectors; search is available
errorBackfill failed; check server logs for the cause

Get Index Stats

db.ai.indexes.stats(index_id)

Returns the fill rate for an index — useful for progress monitoring or health checks.

response = db.ai.indexes.stats(index_id)
stats = response.data
print(f"{stats['indexedRecords']} / {stats['totalRecords']} records indexed")

Delete Index

db.ai.indexes.delete(index_id)

Remove an embedding index policy. The underlying Neo4j DDL vector index is only dropped when zero embeddings remain across the entire project.

db.ai.indexes.delete(index_id)

Waiting for an index to become ready

For managed indexes, backfill runs asynchronously. Poll db.ai.indexes.find() until status is ready:

import time

def wait_for_index_ready(db, index_id, timeout_s=90):
deadline = time.time() + timeout_s
while time.time() < deadline:
response = db.ai.indexes.find()
idx = next((i for i in response.data if i["id"] == index_id), None)
if idx and idx["status"] == "ready":
return
if idx and idx["status"] == "error":
raise RuntimeError("Index entered error state")
time.sleep(3)
raise TimeoutError("Index did not become ready in time")

response = db.ai.indexes.create({"label": "Book", "propertyName": "description"})
wait_for_index_ready(db, response.data["id"])
# now safe to call db.ai.search(...)

Multiple indexes on the same property

You can have more than one index per (label, propertyName) pair, provided the signature differs:

# Cosine index
db.ai.indexes.create({
"label": "Product",
"propertyName": "description",
"similarityFunction": "cosine",
"dimensions": 768,
})

# Euclidean index on the same property
db.ai.indexes.create({
"label": "Product",
"propertyName": "description",
"similarityFunction": "euclidean",
"dimensions": 768,
})

When searching or writing vectors against a property with multiple indexes, specify similarityFunction to disambiguate. See Advanced Indexing for details.


Index shape

{
"id": str,
"projectId": str,
"label": str,
"propertyName": str,
"modelKey": str,
"sourceType": str, # 'managed' | 'external'
"similarityFunction": str, # 'cosine' | 'euclidean'
"dimensions": int,
"vectorPropertyName": str, # internal Neo4j property name for the vector
"enabled": bool,
"status": str, # 'pending' | 'indexing' | 'awaiting_vectors' | 'ready' | 'error'
"createdAt": str,
"updatedAt": str,
}

String List Properties

List[str]

String list properties are supported. Each item in the list is embedded individually, then mean-pooled into a single vector stored on the relationship.