Skip to main content

Semantic Search

RushDB lets you search records by meaning, not just exact field values. Create an embedding index on any string property and every record that carries that property becomes searchable by natural-language similarity — while still supporting all the usual field filters, pagination, and graph traversal.

How It Works

  1. Create an index — Pick a label and a string property. RushDB creates a vector index policy.
  2. Embed — For managed indexes, RushDB generates embeddings automatically on write and backfills existing records. For external (BYOV) indexes, your application supplies pre-computed vectors.
  3. Search — Pass a natural-language query (managed) or a pre-computed vector (external). RushDB ranks candidates by cosine or euclidean similarity and returns scored results.

Managed vs. External Indexes

AspectManagedExternal (BYOV)
Embeddings generated byRushDB (server-side)Your application
Write flowAutomatic on record create/updateSupply vectors via upsertVectors or inline on write
Search inputNatural-language query stringPre-computed queryVector array
Model controlRushDB-managed modelAny model, any dimension

Both types store vectors on the value relationship between the property node and the record node, using Neo4j's native vector index for fast retrieval.

Combining with Field Filters

Semantic search is not an either/or — it composes with RushDB's structured query capabilities. Pass a where clause to pre-filter candidates before similarity ranking:

Search "space exploration"
WHERE genre = "sci-fi" AND year >= 2000
LIMIT 10

This narrows the vector search to only matching records, keeping results precise and fast.

Two Ways to Search Semantically

1. db.ai.search() — dedicated semantic endpoint

The simplest path. Returns records ranked by similarity score (__score):

  • Accepts query (text) for managed indexes or queryVector for external indexes.
  • Supports where pre-filtering, limit, and skip.
  • Results always ordered by __score descending (best match first).

2. vector.similarity aggregation in SearchQuery

For advanced use cases, add a vector.similarity.cosine or vector.similarity.euclidean aggregation to any db.records.find() call. This gives you the full SearchQuery feature set (groupBy, collect, multi-hop relationships) alongside similarity scoring.

→ See Search — Select Expressions for the aggregation syntax.

Index Lifecycle

StateDescription
pendingIndex created, backfill not yet started.
indexingBackfill in progress — existing records are being embedded.
readyAll records indexed. New records are embedded on write automatically.

You can check index status at any time and list all indexes for a project.

ScenarioApproach
User knows the exact valueStructured where filter
User describes what they want in natural languagedb.ai.search() with query
Combine meaning + exact constraintsdb.ai.search() with where pre-filter
Need groupBy, collect, or multi-hop alongside similaritydb.records.find() with vector.similarity aggregation

→ See also Agent Memory Model for how semantic search fits into the three-layer retrieval stack.


Implementation Reference

Each interface covers search, indexing, and BYOV — pick the one that fits your stack: