Version: 2.x(alpha)

`DEFINE INDEX` statement

Just like in other databases, SurrealDB uses indexes to help optimize query performance. An index can consist of one or more fields in a table and can enforce a uniqueness constraint. If you don't intend for your index to have a uniqueness constraint, then the fields you select for your index should have a high degree of cardinality, meaning that there is a high amount of diversity between the data in the indexed table records.

Requirements

You must be authenticated as a root or namespace user before you can use the DEFINE INDEX statement.
You must select your namespace and database before you can use the DEFINE INDEX statement.

Statement syntax

SurrealQL Syntax
DEFINE INDEX [ IF NOT EXISTS ] @name ON [ TABLE ] @table [ FIELDS | COLUMNS ] @fields
	[ UNIQUE | SEARCH ANALYZER @analyzer [ BM25 [(@k1, @b)] ] [ HIGHLIGHTS ] [ MTREE DIMENSION ] @dimension ] [ COMMENT @string ]

Index Types

SurrealDB offers a range of indexing capabilities designed to optimize data retrieval and search efficiency.

Unique Index

Ensures each value in the index is unique. A unique index helps enforce uniqueness across records by preventing duplicate entries in fields such as user IDs, email addresses, and other unique identifiers.

Let's create a unique index for the email address field on a user table.

-- Makes sure that the email address in the user table is always unique
DEFINE INDEX userEmailIndex ON TABLE user COLUMNS email UNIQUE;

The created index can be tested using the INFO statement.

INFO FOR TABLE user;

The INFO statement will help you understand what indexes are defined in your TABLE.

{
    "events": {},
    "fields": {},
    "indexes": {
        "userEmailIndex": "DEFINE INDEX userEmailIndex ON user FIELDS email UNIQUE"
    },
    "lives": {},
    "tables": {}
}

As we defined a UNIQUE index on the email column, a duplicate entry for that column or field will throw an error.

-- Create a user record and set an email ID. 
CREATE user:1 SET email = 'test@surrealdb.com';

Response
[
    {
        "email": "test@surrealdb.com",
        "id": "user:1"
    }
]

Creating another record with the same email ID will throw an error.

-- Create another user record and set the same email ID.
CREATE user:2 SET email = 'test@surrealdb.com';

Response
Database index `userEmailIndex` already contains 'test@surrealdb.com', 
with record `user:1`

To set the same email for user:2, the original record must be deleted

DELETE user:1;
CREATE user:2 SET email = 'test@surrealdb.com'

[
    {
        "email": "test@surrealdb.com",
        "id": "user:2"
    }
]

Non-Unique Index

This allows for the indexing of attributes that may have non-unique values, facilitating efficient data retrieval. Non-unique indexes help index frequently appearing data in queries that do not require uniqueness, such as categorization tags or status indicators.

Let's create a non-unique index for an age field on a user table.

-- optimise queries looking for users of a given age
DEFINE INDEX userAgeIndex ON TABLE user COLUMNS age;

Composite Index

The composite index spans multiple fields and columns of a table. Similar to a single field index, composite indexes can also be UNIQUE

-- Create an index on the account and email fields of the user table
DEFINE INDEX test ON user FIELDS account, email;

Full-Text Search Index

Enables efficient searching through textual data, supporting advanced text-matching features like proximity searches and keyword highlighting.

The Full-Text search index helps implement comprehensive search functionalities in applications, such as searching through articles, product descriptions, and user-generated content. Let's create a full-text search index for a name field on a user table.

-- Allow full-text search queries on the name of the user
DEFINE INDEX userNameIndex ON TABLE user COLUMNS name SEARCH ANALYZER ascii BM25 HIGHLIGHTS;

SEARCH: By using the SEARCH keyword, you enable full-text search on the specified column.
ANALYZER ascii: Uses a custom analyzer called ascii for tokenizing and analyzing the text.
BM25: Ranking algorithm used for relevance scoring.
HIGHLIGHTS: Allows keyword highlighting in search results output when using the search::highlight function

Vector Search Indexes

Vector search indexes in SurrealDB support efficient k-nearest neighbors (kNN) and Approximate Nearest Neighbor (ANN) operations, which are pivotal in performing similarity searches within complex, high-dimensional datasets and data types.

M-Tree index

The M-Tree index is suitable for the task of finding exact nearest neighbors based on a distance metric (like Euclidean distance). Mtree currently supports Euclidean, Cosine, Manhattan and Minkowski distance functions. In SurrealDB, you can also use an M-Tree based index to perform the search.

Note: When no function is specified it chooses Euclidean for the default distance function.

For example, consider a record with 4 dimensional vectors.

CREATE pts:1 SET point = [1,2,3,4];
CREATE pts:2 SET point = [4,5,6,7];
CREATE pts:3 SET point = [8,9,10,11];

To define an M-tree index on the point field of the pts table, you can use the query below:

DEFINE INDEX mt_pt ON pts FIELDS point MTREE DIMENSION 4;

In the above example, the MTREE DIMENSION clause specifies the dimension of the vector. The point field is a 4-dimensional vector array, so we set the dimension to 4. If this is sucessfully created, you can use the index to perform vector searches to find records based on the distance from a given point.

In the example below, the query searches for points closest to the vector [2,3,4,5] and uses vector functions to calculate the distance between two points, indicated by <|2|>.

LET $pt = [2,3,4,5];
SELECT id, vector::distance::euclidean(point, $pt) AS dist FROM pts WHERE point <|2|> $pt

Output
[
    {
        "dist": 2,
        "id": "pts:1"
    },
    {
        "dist": 4,
        "id": "pts:2"
    }
]

Verifying Index Utilization in Queries

The EXPLAIN clause from SurrealQL helps you understand the execution plan of the query and provides transparency around index utilization.

SELECT * FROM user WHERE email='test@surrealdb.com' EXPLAIN FULL;

It also reveals details about which operation was used by the query planner and how many records matched the search criteria.

[
    {
        "detail": {
            "plan": {
                "index": "userEmailIndex",
                "operator": "=",
                "value": "test@surrealdb.com"
            },
            "table": "user"
        },
        "operation": "Iterate Index"
    },
    {
        "detail": {
            "count": 1
        },
        "operation": "Fetch"
    }
]

Using `IF NOT EXISTS` clause Since 1.3.0

The IF NOT EXISTS clause can be used to define a index only if it does not already exist. If the index already exists, the DEFINE INDEX statement will return an error.

-- Create a INDEX if it does not already exist
DEFINE INDEX IF NOT EXISTS example ON example FIELDS example;

Performance Implications

When defining indexes, it's essential to consider the fields most frequently queried or used to optimize performance.

Indexes may improve the performance of SurrealQL statements. This may not be noticeable with small tables but it can be significant for large tables; especially when the indexed fields are used in the WHERE clause of a SELECT statement.

Indexes can also impact the performance of write operations (INSERT, UPDATE, DELETE) since the index needs to be updated accordingly. Therefore, it's essential to balance the need for read performance with write performance.

DEFINE INDEX statement

Requirements​

Statement syntax​

Index Types​

Unique Index​

Non-Unique Index​

Composite Index​

Full-Text Search Index​

Vector Search Indexes​

M-Tree index​

Verifying Index Utilization in Queries​

Using IF NOT EXISTS clause Since 1.3.0​

Performance Implications​