Storage

RushDB leverages Neo4j (version 5.25.1 or higher) as its underlying storage engine, enhanced with the APOC (Awesome Procedures On Cypher) and GDS (Graph Data Science) plugins to perform efficient vector similarity searches and advanced graph operations.

Graph Database vs. Traditional Databases

Unlike traditional database models, Neo4j's graph approach offers distinct advantages for connected data:

Database Type	Core Concept	RushDB Analogy
Relational DB	Tables with rows and columns	A table would be a label, a row would be a record, but relationships would require complex JOIN operations
Document DB	Collections of JSON documents	Each JSON document would be a record, but connecting documents requires explicit reference fields
Graph DB (Neo4j)	Nodes and relationships	Records are nodes with properties, and relationships are first-class citizens that connect related data

In Neo4j, relationships are physical connections in the database, not just foreign key references, enabling:

Traversing connections without costly JOIN operations
Discovering patterns across different data types
Modeling complex, interconnected data naturally

Neo4j Foundation

Neo4j provides RushDB with a robust graph database foundation, allowing for:

High-performance graph traversals
ACID-compliant transactions
Property graph model flexibility
Scalable data storage and retrieval

The integration with APOC and GDS plugins extends Neo4j's native capabilities with vector-based operations critical for machine learning workflows and similarity search functions.

Data Overhead

Each record in RushDB (a meaningful key-value data piece) is extended with several internal properties that enable advanced functionality:

Internal Key	Client Representation	Description
`__RUSHDB__KEY__ID__`	`__id`	UUIDv7 that enables lexicographic ordering without relying on user-defined fields like `createdAt`. RushDB SDKs support converting `__id` to timestamp and ISO8601 date. For more details, see UUIDv7 specification.
`__RUSHDB__KEY__PROPERTIES__META__`	`__proptypes`	Stringified meta-object holding the types of data in the current record, e.g., `{ name: "string", active: "boolean", ... }`
`__RUSHDB__KEY__LABEL__`	`__label`	Record Label identifier. Every record has two labels: a default one (`__RUSHDB__LABEL__RECORD__`) and a user-defined one that is searchable. Currently, RushDB allows only one custom label per record, and it is required by default. For more details about labels, see Labels.
`__RUSHDB__KEY__PROJECT__ID__`	`__projectId`	Project identifier for multitenancy isolation. This property is never exposed to clients via UI or API.

Data Structure

RushDB organizes data in a graph structure that represents both records and properties as first-class entities:

In this structure:

Graph Element	Internal Label	Description
Records	`__RUSHDB__LABEL__RECORD__`	Nodes that represent user-defined entities (Car, Engine, Motorcycle)
Properties	`__RUSHDB__LABEL__PROPERTY__`	Nodes that represent data attributes with a "name" field storing the property name
Value Relations	`__RUSHDB__RELATION__VALUE__`	Edges connecting properties to their records (the property → record direction)
Default Relations	`__RUSHDB__RELATION__DEFAULT__`	Edges connecting related records (like Car to Engine)

This structure allows efficient traversals across related records while maintaining property-based connections that can reveal relationships between otherwise unrelated entities.

Property Graph Model

RushDB implements a unique property graph model where properties are first-class citizens:

Properties interconnect records through a unique set of field name and type. For example, both a Car record and a Flower record can share a common property color:string. This approach:

Enables performant queries across different record types
Facilitates the discovery of hidden insights in data
Creates a natural graph structure that leverages Neo4j's native traversal capabilities

Properties are not shared amongst projects (database instances), ensuring complete isolation in multi-tenant environments.

Data Types

RushDB supports a wide range of data types to accommodate diverse data needs and provide a flexible environment for your applications. Below is a comprehensive find of the supported data types along with their descriptions:

`string`

This data type is used for any textual information and can hold text of unlimited length.

`number`

This data type accommodates both floating-point numbers and integers. For instance, it can handle values like -120.209817 (a float) or 42 (an integer).

`datetime`

This data type adheres to the ISO 8601 format, including timezones. For example: 2012-12-21T18:29:37Z.

`boolean`

This data type can only have two possible values: true or false.

`null`

This data type has only one possible value: null.

`vector`

This data type accommodates arrays of both floating-point numbers and integers. It handles values like [0.99070,0.78912, 1, 0]. This is particularly useful for vector similarity searches and machine learning operations.

Arrays

In essence, RushDB supports all the data types that JSON does. However, when it comes to arrays (or Lists), RushDB can indeed hold them as Property values, but it's important to note that it can only store consistent values within those arrays. To learn more, check out the Properties section.

Note: Every data type mentioned above (except vector, since it's already an array by default) supports an array representation.

Here some valid examples:

["apple", "banana", "carrot"] - good
[null, null, null, null, null] - weird, but works fine 🤔
[4, 8, 15, 16, 23, 42] - works as well
["2023-09-17T02:47:54+04:00", "1990-08-18T04:35:00+05:00"] - also good
[true, false, true, false, true] - love is an answer (🌼)

Type Handling

When records are imported into RushDB, data types are automatically inferred and stored in the __RUSHDB__KEY__PROPERTIES__META__ internal field (exposed to clients as __proptypes). This metadata is crucial for maintaining type consistency and enabling efficient property-based connections across the graph.

To learn more about how RushDB uses data types for property values and type inference during data import, see REST API - Import Data.

Data Import Mechanism

RushDB applies a breadth-first search (BFS) algorithm to parse JSON tree structures, enhancing each flat level with:

A unique ID (__id), automatically generated or provided by the user for top-level records
Labels (__label), either explicitly provided by the user for top-level records or derived from parent keys for nested objects
Type inference, automatically suggesting data types for all properties
Relationship establishment, connecting nested records with __RUSHDB__RELATION__DEFAULT__ relationships

This approach allows for intuitive transformation of hierarchical JSON data into graph structures without requiring users to understand the underlying graph model.

Example of JSON to graph transformation:

{
  "car": {
    "make": "Tesla",
    "model": "Model 3",
    "engine": {
      "power": 283,
      "type": "electric"
    }
  }
}

Which transforms into the following graph structure:

In this representation:

Records are nodes with the label __RUSHDB__LABEL__RECORD__ plus a user-defined label (car, engine)
Records store the actual values of properties directly as node attributes
Records also store __proptypes metadata about property types
Properties are nodes with the single label __RUSHDB__LABEL__PROPERTY__ and contain only the name and type fields (not values)
Nested objects become connected records with __RUSHDB__RELATION__DEFAULT__ relationships
Properties are connected to their records via __RUSHDB__RELATION__VALUE__ relationships (the property → record direction)

The JSON representation of these records as stored in the database would look like:

[
  {
    "__RUSHDB__KEY__ID__": "01968aa4-22c1-781a-8e8c-8fe6be6c3fd4",
    "__RUSHDB__KEY__LABEL__": "car",
    "__RUSHDB__KEY__PROJECT__ID__": "01968aa4-4225-7833-ba60-2f5e4383bf1b",
    "__RUSHDB__KEY__PROPERTIES__META__": "{\"make\":\"string\",\"model\":\"string\"}",
    "make": "Tesla",
    "model": "Model 3"
  },
  {
    "__RUSHDB__KEY__ID__": "01968aa4-74af-73e4-984d-3888d63ec72e",
    "__RUSHDB__KEY__LABEL__": "engine",
    "__RUSHDB__KEY__PROJECT__ID__": "01968aa4-4225-7833-ba60-2f5e4383bf1b",
    "__RUSHDB__KEY__PROPERTIES__META__": "{\"power\":\"number\",\"type\":\"string\"}",
    "power": 283,
    "type": "electric"
  }
]

When these records are returned through RushDB's API or SDKs, the internal keys are transformed to their client-friendly aliases:

[
  {
    "__id": "01968aa4-22c1-781a-8e8c-8fe6be6c3fd4",
    "__label": "car",
    "__proptypes": {"make":"string","model":"string"},
    "make": "Tesla",
    "model": "Model 3"
  },
  {
    "__id": "01968aa4-74af-73e4-984d-3888d63ec72e",
    "__label": "engine",
    "__proptypes": {"power":"number","type":"string"},
    "power": 283,
    "type": "electric"
  }
]

Note that the __projectId field is never exposed to clients via the API or SDKs as noted in the Data Overhead section.

Each of these records is also connected to Property nodes which define the metadata for their fields, but those Property nodes don't store the actual values.

Database Indexes and Constraints

When RushDB initializes a new database connection, it automatically creates several indexes and constraints to ensure data integrity and optimize query performance:

Core Constraints

The following uniqueness constraints are created to enforce data consistency:

Constraint Name	Node Label	Property	Description
`constraint_user_login`	`__RUSHDB__LABEL__USER__`	`login`	Ensures each user has a unique login
`constraint_user_id`	`__RUSHDB__LABEL__USER__`	`id`	Ensures each user has a unique ID
`constraint_token_id`	`__RUSHDB__LABEL__TOKEN__`	`id`	Ensures each token has a unique ID
`constraint_project_id`	`__RUSHDB__LABEL__PROJECT__`	`id`	Ensures each project has a unique ID
`constraint_workspace_id`	`__RUSHDB__LABEL__WORKSPACE__`	`id`	Ensures each workspace has a unique ID
`constraint_record_id`	`__RUSHDB__LABEL__RECORD__`	`__RUSHDB__KEY__ID__`	Ensures each record has a unique ID
`constraint_property_id`	`__RUSHDB__LABEL__PROPERTY__`	`id`	Ensures each property has a unique ID

Performance Indexes

The following indexes are created to optimize query performance:

Index Name	Node Label	Properties	Description
`index_record_id`	`__RUSHDB__LABEL__RECORD__`	`__RUSHDB__KEY__ID__`	Speeds up record lookups by ID
`index_record_projectid`	`__RUSHDB__LABEL__RECORD__`	`__RUSHDB__KEY__PROJECT__ID__`	Enables fast filtering of records by project
`index_property_name`	`__RUSHDB__LABEL__PROPERTY__`	`name`	Enables fast property lookups by name
`index_property_mergerer`	`__RUSHDB__LABEL__PROPERTY__`	`name`, `type`, `projectId`, `metadata`	Optimizes property node merging operations during data imports

These indexes and constraints are essential for RushDB's performance and data integrity, particularly when dealing with large datasets and complex queries across the property graph model. They ensure that:

Record IDs are always unique within the database
Project isolation is maintained in multi-tenant environments
Property lookups are efficient, especially during joins and traversals
User management operations perform optimally

Learn more at REST API - Import Data or through the language-specific SDKs:

Performance Considerations

This approach is carefully designed to:

Enable efficient indexing and querying
Support advanced graph traversals and pattern matching
Facilitate vector similarity searches with minimal computational cost

By structuring data this way, RushDB achieves a balance between storage overhead and query performance, optimizing for use cases that require both traditional database operations and advanced graph analytics capabilities.

Graph Database vs. Traditional Databases​

Neo4j Foundation​

Data Overhead​

Data Structure​

Property Graph Model​

Data Types​

string​

number​

datetime​

boolean​

null​

vector​

Arrays​

Type Handling​

Data Import Mechanism​

Database Indexes and Constraints​

Core Constraints​

Performance Indexes​

Performance Considerations​