Skip to main content

BYOV in Practice: When and Why to Bring Your Own Vectors

"I looked over the docs, but am struggling to understand how to apply it in practice. Is there a case study I can read?"

That's a fair question. This tutorial exists to answer it.

BYOV is one of those features that feels abstract until you hit the problem it solves — at which point it becomes obvious. This walkthrough describes a realistic scenario, explains the decision, and shows the full implementation.


The scenario

A team is building a job listing search product. They have:

  • A PostgreSQL database with ~200 000 job postings (title, description, company, location, salary range)
  • An existing ML pipeline that produces 768-dimensional embeddings using a domain-specific model fine-tuned on job descriptions — better recall than a generic model
  • A compliance requirement: raw job descriptions must not leave their infrastructure (the fine-tuned model runs on-premise)

They want to migrate to RushDB to get graph-based relationships (company → department → role, applicant → skills → jobs), unified filtering, and semantic search — all in one place.


Why managed embeddings don't fit here

Managed embeddings are the right default. You point RushDB at a property, it handles the rest — generation, storage, backfill, and search.

But this team has two hard blockers:

BlockerWhy managed doesn't solve it
Fine-tuned domain modelRushDB's managed models are general-purpose. A fine-tuned job-description model consistently outperforms them on this dataset.
Compliance: no raw text off-premManaged embeddings require RushDB to call an embedding model (OpenAI, etc.) with the raw text. That's off-prem.

If neither of these applies to you — if a general-purpose model is good enough and compliance is not a concern — use managed embeddings. They're simpler.

BYOV makes sense when:

  • You have or need a fine-tuned, specialized, or multimodal embedding model
  • Compliance prevents raw text from leaving your infrastructure
  • You already produce vectors in a separate pipeline and want to avoid double-embedding
  • You need different embedding models for different fields on the same record

The plan

[PostgreSQL] ──► [embedding pipeline] ──► [RushDB]
raw jobs on-prem model store records + vectors
(768-dim, cosine) search with queryVector
  1. Create an external embedding index in RushDB
  2. Import existing jobs with inline vectors in a batch
  3. Keep the pipeline running: new jobs → embed → write to RushDB
  4. Search using pre-computed query vectors

Step 1: Create the external index

import RushDB from '@rushdb/javascript-sdk'

const db = new RushDB(process.env.RUSHDB_API_KEY!)

const index = await db.ai.indexes.create({
label: 'Job',
propertyName: 'description',
external: true, // your pipeline supplies vectors
similarityFunction: 'cosine',
dimensions: 768 // must match your model's output
})

console.log(index.status) // 'awaiting_vectors'

The index starts with status awaiting_vectors. It becomes ready once the first vector is written.


Step 2: Backfill existing records

Your PostgreSQL table has 200 000 rows. The on-prem embedding pipeline has already produced vectors for all of them. Now load them into RushDB in batches:

import RushDB from '@rushdb/javascript-sdk'

const db = new RushDB(process.env.RUSHDB_API_KEY!)

// Hypothetical: your existing pipeline's embed function
import { embedBatch } from './embedding-pipeline'
import { fetchJobsPage } from './postgres'

const BATCH_SIZE = 100

async function backfill() {
let offset = 0

while (true) {
const rows = await fetchJobsPage({ limit: BATCH_SIZE, offset })
if (rows.length === 0) break

// Embed the descriptions using your on-prem model
const vectors = await embedBatch(rows.map((r) => r.description))

// Write records with inline vectors — one round-trip per batch
await db.records.createMany(
rows.map((row, i) => ({
__label: 'Job',
id: row.id, // preserve your existing IDs
title: row.title,
description: row.description,
company: row.company,
location: row.location,
salaryMin: row.salary_min,
salaryMax: row.salary_max,
__vectors: [
{
propertyName: 'description',
vector: vectors[i]
}
]
}))
)

offset += BATCH_SIZE
console.log(`Imported ${offset} jobs`)
}
}

backfill()

__vectors is the inline write path — the record and its vector are stored in one call. No separate upsert step needed.


Step 3: Keep the pipeline current

For new jobs added after migration, the same pattern applies. In your ingestion handler:

async function ingestJob(job: JobRow) {
// Embed on-prem before sending to RushDB
const [vector] = await embedBatch([job.description])

await db.records.create('Job', {
id: job.id,
title: job.title,
description: job.description,
company: job.company,
location: job.location,
salaryMin: job.salary_min,
salaryMax: job.salary_max,
__vectors: [{ propertyName: 'description', vector }]
})
}

The vector travels with the record. The raw description stays on-prem.


At query time, embed the user's search query on-prem and pass the resulting vector to RushDB:

async function searchJobs(userQuery: string, location?: string) {
// Embed the query with the same model — consistency is critical
const [queryVector] = await embedBatch([userQuery])

const results = await db.ai.search({
label: 'Job',
propertyName: 'description',
queryVector, // pre-computed vector, not a text string
where: location ? { location: { $contains: location } } : undefined,
limit: 20
})

return results.data
}

// Usage
const jobs = await searchJobs('senior backend engineer distributed systems', 'Berlin')

where clauses, pagination, and orderBy work exactly the same as with managed search. The only difference is queryVector instead of query.


What you get

After migration:

  • Semantic search that outperforms generic models on job-specific vocabulary
  • Raw text never left your network — only vectors did
  • Graph relationships (Company → Department → Job) queryable from the same API
  • Unified filter + vector search — e.g. "jobs semantically similar to X, in Berlin, salary > 80k"

When you don't need BYOV

If you find yourself asking "should I use BYOV?" — you probably don't need it yet. Ask these questions:

QuestionIf yes →
Is a general-purpose model (OpenAI, Cohere, etc.) good enough for my use case?Use managed
Can I send raw text to a third-party embedding API?Use managed
Do I want the simplest possible setup?Use managed
Do I have a fine-tuned or multimodal model I can't replace?Use BYOV
Do compliance rules prevent raw text from leaving my infrastructure?Use BYOV
Am I already producing vectors in a separate pipeline?Use BYOV

Managed embeddings are the right default for most projects. BYOV is the escape hatch for when you've hit a wall that managed can't cross.


Further reading