Aggregations

SearchQuery provides powerful aggregation capabilities that allow you to perform calculations and collect data from your records and their relationships.

Aggregation Placement in SearchQuery DTO

All aggregation clauses are defined in the aggregate key of the SearchQuery DTO, which is at the same level as other query parameters such as where, limit, skip, orderBy, and labels:

// SearchQuery
{
  labels: ['COMPANY'],       // Record labels to search
  where: { /* conditions */ },  // Filtering conditions
  limit: 10,                 // Results limit
  skip: 0,                   // Results offset
  orderBy: { name: 'asc' },  // Sorting
  aggregate: {               // Aggregation definitions
    count: {
      fn: 'count',
      alias: '$record'
    },
    // More aggregations...
  }
}

Note: Aggregations are applied only when fetching records and have no effect on other endpoints supporting SearchQuery. The main goal of aggregations is to fetch Records in the desired shape, optimizing data retrieval and transformation in a single query.

Available Aggregation Functions

The following aggregation functions are supported:

avg - Calculate average value of a numeric field
count - Count records (with optional uniq parameter)
max - Get maximum value from a field
min - Get minimum value from a field
sum - Calculate sum of a numeric field
collect - Gather field values or entire records into an array
gds.similarity.* - Calculate vector similarity using various algorithms:
- cosine - Cosine similarity [-1,1]
- euclidean - Euclidean distance normalized to (0,1]
- euclideanDistance - Raw euclidean distance [0,∞)
- jaccard - Jaccard similarity [0,1]
- overlap - Overlap coefficient [0,1]
- pearson - Pearson correlation [-1,1]

Aliases

Every aggregation clause requires an alias parameter that specifies which record from graph traversal should be used. To reference fields from related records in aggregations, you need to define aliases in the where clause using the $alias parameter. By default, the root record has alias $record:

{
  labels: ['COMPANY'],
  where: {
    DEPARTMENT: {
      $alias: '$department',
      PROJECT: {
        $alias: '$project',
        EMPLOYEE: {
          $alias: '$employee'
        }
      }
    }
  },
  aggregate: {
    // Referencing to root record using '$record' alias
    companyName: '$record.name',
    // Now can use $employee in aggregations
    avgSalary: {
      fn: 'avg',
      field: 'salary',
      alias: '$employee'
    }
  }
}

Basic Aggregations

Example topology:

avg

Parameters:

fn: 'avg' - The aggregation function name
field: string - The field to calculate average for
alias: string - The record alias to use
precision?: number - Optional decimal precision for the result

{
  labels: ['COMPANY'],
  where: {
    EMPLOYEE: {
      $alias: '$employee',
      salary: {
        $gte: 50000  // Filter employees by salary
      }
    }
  },
  aggregate: {
    avgSalary: {
      fn: 'avg',
      field: 'salary',
      alias: '$employee',
      precision: 2  // Optional: Set precision for the result
    }
  }
}

count

Parameters:

fn: 'count' - The aggregation function name
alias: string - The record alias to use
field?: string - Optional field to count
uniq?: boolean - Optional flag to count unique values

{
  labels: ['COMPANY'],
  where: {
    EMPLOYEE: {
      $alias: '$employee'
    }
  },
  aggregate: {
    employeesCount: {
      fn: 'count',
      uniq: true,  // Count unique employees
      alias: '$employee'
    }
  }
}

max

Parameters:

fn: 'max' - The aggregation function name
field: string - The field to find maximum value from
alias: string - The record alias to use

{
  labels: ['COMPANY'],
  where: {
    EMPLOYEE: {
      $alias: '$employee'
    }
  },
  aggregate: {
    maxSalary: {
      fn: 'max',
      field: 'salary',
      alias: '$employee'
    }
  }
}

min

Parameters:

fn: 'min' - The aggregation function name
field: string - The field to find minimum value from
alias: string - The record alias to use

{
  labels: ['COMPANY'],
  where: {
    EMPLOYEE: {
      $alias: '$employee'
    }
  },
  aggregate: {
    minSalary: {
      fn: 'min',
      field: 'salary',
      alias: '$employee'
    }
  }
}

sum

Parameters:

fn: 'sum' - The aggregation function name
field: string - The field to calculate sum for
alias: string - The record alias to use

{
  labels: ['COMPANY'],
  where: {
    EMPLOYEE: {
      $alias: '$employee'
    }
  },
  aggregate: {
    totalWage: {
      fn: 'sum',
      field: 'salary',
      alias: '$employee'
    }
  }
}

collect

Parameters:

fn: 'collect' - The aggregation function name
alias: string - The record alias to use
field?: string - Optional field to collect (if not provided, collects entire records)
uniq?: boolean - Optional flag to collect unique values only. True by default.
limit?: number - Optional maximum number of items to collect
skip?: number - Optional number of items to skip
orderBy?: TSearchSort - Optional sorting configuration

{
  labels: ['COMPANY'],
  where: {
    EMPLOYEE: {
      $alias: '$employee'
    }
  },
  aggregate: {
    employeeNames: {
      fn: 'collect',
      field: 'name',
      alias: '$employee',
      uniq: true  // Optional: true by default
    }
  }
}

Complete Example

{
  labels: ['COMPANY'],
  where: {
    EMPLOYEE: {
      $alias: '$employee',  // Define alias for employee records
      salary: {
        $gte: 50000  // Filter employees by salary
      }
    }
  },
  aggregate: {
    // Use field directly from record
    companyName: '$record.name',

    // Count unique employees using the defined alias
    employeesCount: {
      fn: 'count',
      uniq: true,
      alias: '$employee'
    },

    // Calculate total salary using the defined alias
    totalWage: {
      fn: 'sum',
      field: 'salary',
      alias: '$employee'
    },

    // Collect unique employees names
    employeeNames: {
      fn: 'collect',
      field: 'name',
      alias: '$employee'
    },

    // Get average salary with precision
    avgSalary: {
      fn: 'avg',
      field: 'salary',
      alias: '$employee',
      precision: 0
    },

    // Get min and max salary
    minSalary: {
      fn: 'min',
      field: 'salary',
      alias: '$employee'
    },
    maxSalary: {
      fn: 'max',
      field: 'salary',
      alias: '$employee'
    }
  }
}

Example data and response

// Company record
{
  __id: "018838b8-2e1f-7000-8000-a29392548450",
  __label: "COMPANY",
  name: "TechCorp",
  stage: ["seed"]
}

// Employee records
[
  {
    __id: "018838b8-2e1f-7000-8000-b45fd8932abc",
    __label: "EMPLOYEE",
    name: "John Doe",
    salary: 550000
  },
  {
    __id: "018838b8-2e1f-7000-8000-c67de9043def",
    __label: "EMPLOYEE",
    name: "Jane Smith",
    salary: 600000
  }
]

// Query result:
{
  data: [{
    __id: "018838b8-2e1f-7000-8000-a29392548450",
    __label: "COMPANY",
    companyName: "TechCorp",
    employeesCount: 2,
    totalWage: 1150000,
    avgSalary: 575000,
    minSalary: 550000,
    maxSalary: 600000,
    employeeNames: ["Jane Smith", "John Doe"]
  }],
  total: 1,
  success: true
}

Nested Aggregations

SearchQuery supports two types of nested aggregations:

1. Collecting Nested Records

You can use the collect operator to build nested JSON structures containing arrays of related records. Due to Cypher limitations, when using nested collection, only the collect operator is supported at nested levels.

Example topology:

Example with nested where clauses and corresponding aggregations:

{
  labels: ['COMPANY'],
  where: {
    DPEARTMENT: {
      $alias: '$department',  // Level 1 alias
      PROJECT: {
        $alias: '$project',   // Level 2 alias
        EMPLOYEE: {
          $alias: '$employee', // Level 3 alias
          salary: {
            $gte: 100000      // Filter condition
          }
        }
      }
    }
  },
  aggregate: {
    departments: {
      fn: 'collect',
      alias: '$department',   // Use Level 1 alias
      aggregate: {
        projects: {
          fn: 'collect',
          alias: '$project',  // Use Level 2 alias
          orderBy: {
            projectName: 'asc'
          },
          aggregate: {
            employees: {
              fn: 'collect',
              alias: '$employee', // Use Level 3 alias
              orderBy: {
                salary: 'desc'
              },
              limit: 3
            }
          }
        }
      }
    }
  }
}

Example data and response

// Company record
{
  __id: "018838b8-2e1f-7000-8000-a29392548450",
  __label: "COMPANY",
  name: "TechCorp",
  rating: 4
}

// Department records
[
  {
    __id: "018838b8-2e1f-7000-8000-d89ef1154abc",
    __label: "departments",
    name: "Engineering"
  },
  {
    __id: "018838b8-2e1f-7000-8000-e92fg2265bcd",
    __label: "departments",
    name: "Sales"
  }
]

// Project records
[
  {
    __id: "018838b8-2e1f-7000-8000-f34gh3376cde",
    __label: "projects",
    projectName: "Mobile App"
  },
  {
    __id: "018838b8-2e1f-7000-8000-g56hi4487def",
    __label: "projects",
    projectName: "Web Platform"
  }
]

// Employee records
[
  {
    __id: "018838b8-2e1f-7000-8000-h78ij5598efg",
    __label: "employees",
    name: "John Doe",
    salary: 500000
  },
  {
    __id: "018838b8-2e1f-7000-8000-i90kl6609fgh",
    __label: "employees",
    name: "Jane Smith",
    salary: 550000
  },
  {
    __id: "018838b8-2e1f-7000-8000-j12mn7710ghi",
    __label: "employees",
    name: "Bob Wilson",
    salary: 600000
  }
]

// Query result with nested collection:
{
  data: [{
    __id: "018838b8-2e1f-7000-8000-a29392548450",
    __label: "COMPANY",
    departments: [{
      __id: "018838b8-2e1f-7000-8000-d89ef1154abc",
      name: "Engineering",
      projects: [{
        __id: "018838b8-2e1f-7000-8000-f34gh3376cde",
        projectName: "Mobile App",
        employees: [
          {
            __id: "018838b8-2e1f-7000-8000-j12mn7710ghi",
            name: "Bob Wilson",
            salary: 600000
          },
          {
            __id: "018838b8-2e1f-7000-8000-i90kl6609fgh",
            name: "Jane Smith",
            salary: 550000
          },
          {
            __id: "018838b8-2e1f-7000-8000-h78ij5598efg",
            name: "John Doe",
            salary: 500000
          }
        ]
      }]
    }]
  }],
  total: 1,
  success: true
}

2. Aggregating Values from Nested Records

Example topology:

Example with deep nested aggregation:

{
  labels: ['COMPANY'],
  where: {
    PROJECT: {
      EMPLOYEE: {
        $alias: '$employee',
        salary: {
          $gte: 50000  // Filter condition
        }
      }
    }
  },
  aggregate: {
    avgEmployeeSalary: {
      fn: 'avg',
      field: 'salary',
      alias: '$employee' // Use alias from deepest level
    }
  }
}

Example data and response

// Company record
{
  __id: "018838b8-2e1f-7000-8000-a29392548450",
  __label: "COMPANY",
  name: "TechCorp",
  rating: 4
}

// Project records
[
  {
    __id: "018838b8-2e1f-7000-8000-f34gh3376cde",
    __label: "PROJECT",
    name: "Mobile App"
  }
]

// Employee records under projects
[
  {
    __id: "018838b8-2e1f-7000-8000-h78ij5598efg",
    __label: "EMPLOYEE",
    name: "John Doe",
    salary: 500000
  },
  {
    __id: "018838b8-2e1f-7000-8000-i90kl6609fgh",
    __label: "EMPLOYEE",
    name: "Jane Smith",
    salary: 550000
  }
]

// Query result with aggregated values:
{
  data: [{
    __id: "018838b8-2e1f-7000-8000-a29392548450",
    __label: "COMPANY",
    avgEmployeeSalary: 525000
  }],
  total: 1,
  success: true
}

Collect Operator Options

The collect operator supports additional options for pagination and sorting:

limit - Maximum number of records to collect
skip - Number of records to skip
orderBy - Sort collected records by specified fields
uniq - Collect only unique values (when collecting field values)
field - Collect specific field values instead of entire records

Example:

{
  labels: ['COMPANY'],
  where: {
    DEPARTMENT: {
      $alias: '$department'
    }
  },
  aggregate: {
    // Collect unique tags from departments
    tags: {
      fn: 'collect',
      alias: '$department',
      field: 'tags',    // Collect only tags field
      uniq: true,       // Remove duplicates
      limit: 100,       // Collect up to 100 tags
      orderBy: {        // Sort alphabetically
        name: 'asc'
      }
    }
  }
}

Example data and response

// Company record
{
  __id: "018838b8-2e1f-7000-8000-a29392548450",
  __label: "COMPANY",
  name: "TechCorp"
}

// Department records
[
  {
    __id: "018838b8-2e1f-7000-8000-d89ef1154abc",
    __label: "DEPARTMENT",
    name: "Engineering",
    tags: ["tech", "development", "agile"]
  },
  {
    __id: "018838b8-2e1f-7000-8000-e92fg2265bcd",
    __label: "DEPARTMENT",
    name: "Sales",
    tags: ["sales", "business", "development"]
  }
]

// Query result with collected tags:
{
  data: [{
    __id: "018838b8-2e1f-7000-8000-a29392548450",
    __label: "COMPANY",
    tags: ["agile", "business", "development", "sales", "tech"]
  }],
  total: 1,
  success: true
}

Vector Similarity Aggregations

Example topology:

gds.similarity.*

Parameters:

fn: 'gds.similarity.[algorithm]' - The similarity algorithm to use
- gds.similarity.cosine - Cosine similarity [-1,1]
- gds.similarity.euclidean - Euclidean distance normalized to (0,1]
- gds.similarity.euclideanDistance - Raw euclidean distance [0,∞)
- gds.similarity.jaccard - Jaccard similarity [0,1]
- gds.similarity.overlap - Overlap coefficient [0,1]
- gds.similarity.pearson - Pearson correlation [-1,1]
field: string - The vector field to compare
alias: string - The record alias to use
query: number[] - The query vector to calculate similarity against

Example showing vector search with where clause and similarity aggregation:

{
  labels: ['DOCUMENT'],
  where: {},
  aggregate: {
    // Calculate similarity score using root level alias
    similarity: {
      fn: 'gds.similarity.cosine',
      field: 'embedding',
      query: [1, 2, 3, 4, 5],
      alias: '$record'
    }
  }
}

Example data and response

// Document record
{
  __id: "018838b8-2e1f-7000-8000-k34op8821hij",
  __label: "DOCUMENT",
  title: "Machine Learning Basics"
}

// Chunk records
[
  {
    __id: "018838b8-2e1f-7000-8000-l56pq9932ijk",
    __label: "CHUNK",
    content: "Introduction to neural networks",
    embedding: [1.2, 0.5, -0.3, 0.8, 0.1]
  },
  {
    __id: "018838b8-2e1f-7000-8000-m78rs0043jkl",
    __label: "CHUNK",
    content: "Deep learning architectures",
    embedding: [0.9, 0.4, -0.2, 0.7, 0.3]
  }
]

// Query result with similarity scores:
{
  data: [{
    __id: "018838b8-2e1f-7000-8000-k34op8821hij",
    __label: "DOCUMENT",
    similarity: 0.82,
    chunks: [
      {
        __id: "018838b8-2e1f-7000-8000-l56pq9932ijk",
        __label: "CHUNK",
        content: "Introduction to neural networks",
        similarity: 0.78
      },
      {
        __id: "018838b8-2e1f-7000-8000-m78rs0043jkl",
        __label: "CHUNK",
        content: "Deep learning architectures",
        similarity: 0.65
      }
    ]
  }],
  total: 1,
  success: true
}

Aggregation Placement in SearchQuery DTO​

Available Aggregation Functions​

Aliases​

Basic Aggregations​

avg​

count​

max​

min​

sum​

collect​

Complete Example​

Nested Aggregations​

1. Collecting Nested Records​

2. Aggregating Values from Nested Records​

Collect Operator Options​

Vector Similarity Aggregations​

gds.similarity.*​

Aggregation Placement in SearchQuery DTO

Available Aggregation Functions

Aliases

Basic Aggregations

avg

count

max

min

sum

collect

Complete Example

Nested Aggregations

1. Collecting Nested Records

2. Aggregating Values from Nested Records

Collect Operator Options

Vector Similarity Aggregations

gds.similarity.*