Skip to main content

Bring Your Own Vectors (BYOV)

External indexes let you supply pre-computed embedding vectors instead of having the server compute them. Use them when you need:

  • A custom or private model the server cannot access
  • Multimodal embeddings (image, audio, document structure)
  • Vectors already produced by your ML pipeline
  • Reproducible embeddings not tied to the server's active model

External vs Managed Comparison

ManagedExternal
sourceType"managed""external"
Initial status"pending""awaiting_vectors"
Who computes embeddingsRushDB server (configured model)Your application
dimensions requiredNo (uses server default)Yes
Backfill for existing recordsAutomaticManual via upsertVectors / upsert_vectors or inline writes

Create an External Index

An external index starts with status awaiting_vectors and transitions to ready once at least one vector has been written.

db.ai.indexes.create()

response = db.ai.indexes.create({
"label": "Article",
"propertyName": "body",
"sourceType": "external",
"dimensions": 768,
"similarityFunction": "cosine",
})
print(response.data["status"]) # 'awaiting_vectors'

dimensions is required for external indexes — the server cannot infer it without an embedding model.


Bulk-Upsert Vectors

Use upsertVectors / upsert_vectors to seed an external index from an existing dataset or batch pipeline. The request is idempotent — calling it again with the same recordId replaces the stored vector.

db.ai.indexes.upsert_vectors(index_id, params)

# Fetch records and embed with your own model
records_response = db.records.find({"where": {"__label": "Article"}})

items = []
for record in records_response.data:
vector = my_embedder.embed(record["body"])
items.append({"recordId": record["__id"], "vector": vector})

db.ai.indexes.upsert_vectors(ext_index_id, {"items": items})

Inline Write (Preferred for New Records)

Instead of a two-step create → upsert_vectors flow, write vectors inline with any record write operation. See Write Records with Vectors for the full reference.

# One step: create record AND write its vector
record = db.records.create(
label="Article",
data={"title": "Warp drives", "body": "Alcubierre metric..."},
vectors=[{"propertyName": "body", "vector": my_embedder.embed("Alcubierre metric...")}],
)

Disambiguation

When the same (label, propertyName) pair has multiple external indexes (e.g. cosine and euclidean), specify similarityFunction to resolve which index to use.

# Create two indexes on the same property
db.ai.indexes.create({
"label": "Product", "propertyName": "embedding",
"sourceType": "external", "similarityFunction": "cosine", "dimensions": 768,
})
db.ai.indexes.create({
"label": "Product", "propertyName": "embedding",
"sourceType": "external", "similarityFunction": "euclidean", "dimensions": 768,
})

# ✅ Write to the cosine index only
db.records.create(
label="Product",
data={"name": "Widget"},
vectors=[{
"propertyName": "embedding",
"vector": vec,
"similarityFunction": "cosine", # required when ambiguous
}],
)

# ✅ Search the euclidean index only
db.ai.search({
"labels": ["Product"],
"propertyName": "embedding",
"queryVector": vec,
"similarityFunction": "euclidean",
})

# ❌ Missing similarityFunction → 422 Unprocessable Entity
db.records.create(
label="Product",
data={"name": "Gadget"},
vectors=[{"propertyName": "embedding", "vector": vec}], # ambiguous!
)

Index signature uniqueness

Two index policies are considered identical (and a second create returns 409 Conflict) when all five fields match:

FieldEffect on uniqueness
label
propertyName
sourceType
similarityFunction
dimensions

Complete BYOV Worked Example

from rushdb import RushDB

db = RushDB("your-api-key")

# 1. Create the external index
idx_response = db.ai.indexes.create({
"label": "Doc",
"propertyName": "content",
"sourceType": "external",
"dimensions": 3,
"similarityFunction": "cosine",
})
ext_index_id = idx_response.data["id"]

# 2. Create records with inline vectors (one round trip per record)
articles = [
{"title": "Alpha", "content": "First article", "vector": [1, 0, 0]},
{"title": "Beta", "content": "Second article", "vector": [0, 1, 0]},
{"title": "Gamma", "content": "Third article", "vector": [0, 0, 1]},
]

for article in articles:
db.records.create(
label="Doc",
data={"title": article["title"], "content": article["content"]},
vectors=[{"propertyName": "content", "vector": article["vector"]}],
)

# 3. Search using a pre-computed query vector
response = db.ai.search({
"labels": ["Doc"],
"propertyName": "content",
"queryVector": [1, 0, 0], # closest to Alpha
"limit": 3,
})

print(response.data[0].get("title")) # "Alpha"
print(response.data[0].get("__score")) # ~1.0

Batch Import with createMany

For bulk seeding with flat rows, use createMany / create_many with the top-level indexed vectors parameter:

db.records.create_many(
label="Doc",
data=[
{"title": "Alpha", "content": "First article"},
{"title": "Beta", "content": "Second article"},
{"title": "Gamma", "content": "Third article"},
],
vectors=[
[{"propertyName": "content", "vector": [1, 0, 0]}], # row 0
[{"propertyName": "content", "vector": [0, 1, 0]}], # row 1
[{"propertyName": "content", "vector": [0, 0, 1]}], # row 2
],
)

Mixing Managed and External Indexes

You can have both a managed index and an external index on the same property simultaneously:

# Managed — server embeds for full-text semantic search
db.ai.indexes.create({"label": "Product", "propertyName": "description"})

# External — your custom multimodal model
db.ai.indexes.create({
"label": "Product",
"propertyName": "description",
"sourceType": "external",
"dimensions": 512,
"similarityFunction": "cosine",
})

When searching against a property with both types, specify similarityFunction (and optionally sourceType) to select the target index.


See also