Full-Text Search
Full-Text search is a feature that allows search capabilities within a database. This feature can be customized with specific analyzers and tokenizers to refine the search process.
In SurrealDB, Full-Text Search supports advanced features like basic and advanced text matching, proximity searches, result ranking, and keyword highlighting.
Full-Text Search in SurrealDB is ACID-compliant, which ensures data integrity and reliability.
Core Concepts and Definitions
In order to perform Full-Text Search on textual data within SurrealDB, we leverage analyzers, tokenizers, indexes and filters to optimize the search precision. We also use different search functions to highlight keywords and improve their relevance.
Analyzers
Analyzers are configurations that use tokenizers or filters to prepare text data for searching.
They are foundational to how text is processed in Full-Text Search. It is defined by its name, a set of tokenizers, and a collection of filters.
Tokenizers
Tokenizers break down text into manageable tokens based on specified rules like spaces or punctuation. If we tokenize the sentence 'Getting started with SurrealDB' based on space, it would be broken down into tokens: Getting, started, with, SurrealDB.
-- Defining a tokenizer that splits text into words based on spaces
DEFINE ANALYZER space_tokenizer TOKENIZERS blank;
Filters
Filters process tokens for further refinement, such as converting to lowercase, removing special characters or breaking down tokens into useful prefixes to prepare for more effective searching.
-- Combining tokenizers and filters into a custom analyzer
DEFINE ANALYZER custom_analyzer TOKENIZERS blank FILTERS lowercase, snowball(english);
Define a Full-Text Index
To make a text field searchable, you need to set up a full-text index on it by using the 'search' keyword. This step is necessary to search through the text easily.
Depending on the use case, each field can be associated with a different analyser.
-- Defining two full-text indexes on the 'title' and 'content' field of the 'book' table
DEFINE INDEX book_title ON book FIELDS title SEARCH ANALYZER custom_analyzer BM25;
DEFINE INDEX book_content ON book FIELDS content SEARCH ANALYZER custom_analyzer BM25;
The MATCHES Operator
The MATCHES operator (@@) is used in queries to find documents that contain the given keywords based on the full-text indexes.
-- Using the MATCHES (@@) operator in a query
SELECT * FROM book WHERE content @@ 'tools';
Highlighting
The search::highlight
highlights the matching keywords for the predicate reference number.
-- Using search::highlight('<b>', '</b>', content) to highlight search terms
SELECT title, search::highlight('<b>', '</b>', content) AS highlighted_content
FROM book WHERE content @@ 'Linux';
The search::offsets
returns the position of the matching keywords for the predicate reference number.
SELECT title, search::offsets(1) AS title_offsets
FROM book WHERE title @1@ 'linux';
Scoring and Ranking Search Results
The search::score
helps with scoring and ranking the search results based on their relevance to the search terms.
The relevance score is a decimal number typically ranging from 0 to 1, where a score closer to 1 indicates a higher relevance of the search result to the query terms, and a score closer to 0 indicates lower relevance. This score helps in sorting or filtering results based on how closely they match the user's search intent
-- Ordering search results by relevance score, where the title should be more relevant than content
SELECT title, search::score(0) * 2 + search::score(1) * 1 AS relevance
FROM book WHERE title @0@ 'linux' AND content @1@ 'ubuntu'
ORDER BY relevance DESC;
LIMIT 10;
The search::score(0)
retrieves the relevance score for the title
field, and search::score(1)
does the same for the content
field.
Results are ordered by the calculated score in descending order (ORDER BY score DESC
), prioritizing books with higher scores, and we limit the results to the top 10 (LIMIT 10
) relevant books.