Skip to main content

Bring Your Own Vectors (BYOV)

By default, RushDB generates embeddings server-side when you create an embedding index on a string property. With BYOV (Bring Your Own Vectors) you compute the embeddings yourself and push them alongside your records. RushDB stores, indexes, and searches them — you stay in full control of the model and the pipeline.

Why BYOV?

ScenarioWhy BYOV helps
Domain-specific or fine-tuned modelUse any model — a locally fine-tuned LLM, a multimodal encoder, a document-structure model — without configuring it server-side
Compliance / data residencyRaw text never leaves your infrastructure; only the numeric vector is sent to RushDB
Multimodal embeddingsEncode images, audio, or structured documents into vectors before storing them
Existing ML pipelineRe-use vectors already produced by your data pipeline
ReproducibilityLock embedding logic to a specific model version; no coupling to server-side model upgrades

Managed vs. External

AspectManagedExternal (BYOV)
sourceTypemanaged (default)external
Who generates embeddingsRushDB serverYour application
Search inputNatural-language query stringPre-computed queryVector array
dimensions required on createNo — uses server defaultYes — must match your model
Initial index statuspendingready after backfillawaiting_vectorsready once first vector is written
Backfill existing recordsAutomaticManual via upsertVectors or inline writes

Both index types can coexist on the same (label, propertyName) pair.

Write Flows

There are two ways to push vectors into an external index.

Option A — Inline at write time

Attach vectors directly inside any record create or import call. The index must already exist before vectors are written.

await db.records.create('Article', {
title: 'Understanding Graph RAG',
body: 'Graphs provide context that plain vector search lacks...',
__vectors: [
{
propertyName: 'body',
vector: await embed('Understanding Graph RAG...') // your embedding function
}
]
})

This is the lowest-latency path: one round-trip creates the record and stores its vector.

Option B — Upsert after the fact

Push vectors separately, useful for seeding an index from an existing dataset or syncing after a batch embedding job.

await db.ai.indexes.upsertVectors(indexId, {
items: [
{ recordId: 'rec_001', vector: [0.1, 0.2, ...] },
{ recordId: 'rec_002', vector: [0.7, 0.8, ...] }
]
})

The upsert call is idempotent — re-running it with the same recordId replaces the stored vector.

Searching with a Pre-computed Vector

Once vectors are stored, search with queryVector instead of query:

const results = await db.ai.search({
label: 'Article',
propertyName: 'body',
queryVector: await embed('graph databases and retrieval'), // your embedding function
limit: 10
})

The result shape is the same as a managed semantic search — records ranked by cosine (or euclidean) similarity, with an optional __score field.

Lifecycle

An external index stays in awaiting_vectors until at least one vector has been written. After that it is ready and searchable.


Implementation Reference