Define a Schema in SurrealDB
When starting a new database project, there are a couple of early decisions to be made, such as creating tables and defining the fields that will be in these tables and also the datatypes of the records, and how the tables you describe relate to each other which includes data sharing or manipulation.
While this might look like many decisions, you are Defining a Schema.
A Schema defines the structure and organisation of data. It dictates how data is stored, organised, and manipulated. Schemas can specify tables, fields (columns), data types, constraints, and relationships between tables.
In summary, a schema is the primary way to ensure your data acts as expected.
There are two main types of schemas:
- Schemafull (Structured)
- Schemaless (Unstructured)
When starting a new project in SurrealDB, you can define your Schema using either method depending on your application's requirements. Since SurrealDB is a multi-model database, you can have both in the same project, depending on your needs.
In this tutorial, you will learn how to define a schema in SurrealDB, what using either schema type means for data retrieval, and how to use what you need as your product grows.
Prerequisites
Before you start, this guide assumes the following:
- Basic knowledge of databases and data modelling.
- SurrealDB is installed on your machine. You can download and install SurrealDB from the installation page.
- A Command line interface (CLI) for interacting with SurrealDB or a Surrealist sandbox.
Schemafull (Structured) Databases
A schemafull database requires the upfront definition of the structure of your data, including collections (tables) and fields (columns). This approach enforces consistency and integrity, making it suitable for applications with well-defined data models.
In surrealDB, the Schemafull approach is is realised through Define statements, that provide instructions on parts of your database, such as authentication access and behaviour, global parameters, table configurations, table events, analyzers, and indexes. You can set a schemafull table in the following steps:
Creating a schemafull table
- Define a Table: To start a schemafull table, specifically use the Define Table statement.
-- Create a schemafull user table.
DEFINE TABLE user SCHEMAFULL;
- Define Fields: Now that the table is schemafull, no fields can be set unless first defined through a
DEFINE FIELD
statement.
-- Define some fields.
DEFINE FIELD firstName ON TABLE user TYPE string;
DEFINE FIELD lastName ON TABLE user TYPE string;
DEFINE FIELD email ON TABLE user TYPE string;
ASSERT string::is::email($value);
In the code above you may notice the ASSERT
clause. This can be used to validate any restrictions you want on a field. In the example above the string::is::email
function is used to check whether the value is an email.
Adding data to a Schemafull table
Now that you have defined all the fields needed, you can start populating them. To do this, you can use the CREATE statement. For example, add a new user:
-- 1: Add a user with all required fields.
CREATE user CONTENT {
firstName: 'John',
lastName: 'Doe',
email: 'JohnDoe@someemail.com',
};
This will return the data in an object. In the case where the email wasn’t a real email for example:
-- Using the CREATE statement to populate the table
CREATE user CONTENT {
firstName: 'John',
lastName: 'Doe',
email: 'JohnDoe.com',
};
The above will return an error because the field must conform to string::is::email($value)
Inserting fields that don’t exist in the schema
In a schemafull table, since the fields need to be defined before you can populate them, If you add a field that doesn’t exist your data will be ignored. For example, in the user table you have only defined the firstname
, lastname
and email
fields. If you introduce a photoURI
field without defining the field in the user
table it will return an error.
-- 2: Add a user with all required fields and an undefined one, 'photoURI'.
CREATE user CONTENT {
firstName: 'John',
lastName: 'Doe',
email: 'JohnDoe@someemail.com',
photoURI: 'photo/yxCFi22Jw2.webp'
};
Schemaless (Unstructured) Databases
A schemaless database does not require predefined structures, allowing for more flexible and dynamic data storage. This approach is ideal for applications with evolving data models or when dealing with diverse and unpredictable data formats.
In SurrealDB there are two ways you can define a schemaless table. You can either use any of the data definition statements such as CREATE
or UPDATE
and that will make a table based on the record ID specified. For example:
-- Using the CREATE statement
CREATE IC_directory:{username: 'johndoe',full_name: 'John Doe'} CONTENT {
username: 'johndoe',
full_name: 'John Doe',
email: 'johndoe@example.com',
date_of_birth: "1990-01-01",
join_date: "2024-05-30",
department: 'Engineering',
role: 'Software Engineer',
skills: ['Python', 'JavaScript', 'surql'],
manager: manager_directory:janesmith,
tags: ['full-time', 'remote']
};
-- Using the UPDATE statement
UPDATE manager_directory:janesmith CONTENT {
username: 'janesmith',
full_name: 'Jane Smith',
email: 'janesmith@example.com',
date_of_birth: "1985-01-01",
join_date: "2019-05-30",
department: 'Engineering',
role: 'Software Engineer Manager',
skills: ['Python', 'JavaScript', 'surql'],
report: IC_directory:{ full_name: 'John Doe', username: 'johndoe' },
tags: ['full-time', 'remote']
};
In the example above, you used the CREATE
statement to make an IC_directory
table and created an Object ID for this table. Record IDs can be specified by you in a large number of formats, defaulting to a random UUID if you don't specify your own format for the ID. . You have also used the UPDATE statement to make a manager_directory
, which has janesmith
as the ID. Learn more about Record IDs in the documentation.
Notice how we have linked these two tables with the manager and report fields in the IC_directory
and manager_directory
tables, respectively.
You can also start defining a Schemaless table using the DEFINE TABLE
statement.
-- Create schemaless user table.
DEFINE TABLE user SCHEMALESS;
-- Define some fields.
DEFINE FIELD firstName ON TABLE user TYPE string;
DEFINE FIELD lastName ON TABLE user TYPE string;
DEFINE FIELD email ON TABLE user TYPE string;
In the example above you have created a schemaless table using the SCHEMALESS
clause.
Inserting fields that don’t exist in the schema
Since a schemaless table doesn’t have any restrictions on the structure if you introduce a new field to an existing table the column will be added to the table even if other records don’t have the value. For example, take the schemafull example of introducing a photoURI
field without defining the field in the user
table:
CREATE user CONTENT {
firstName: 'John',
lastName: 'Doe',
email: 'JohnDoe@someemail.com',
photoURI: 'photo/yxCFi22Jw2.webp'
};
Since the user
table is schemaless the photoURI
field will be added for John
without an error.
Combining Schemafull and Schemaless
Now that you have seen how you can make a table in SurrealDB deciding what to go with depends on the restrictions you want to have on your Schema.
Since SurrealDB is multi-model, If some tables require less restrictions you can make them schemaless and if you want to ensure that the structure remains the same for each entry then go with the schemafull option.
Adding flexible fields in a Schemafull table
In SurrealDB, you can add flexible fields to a schemafull table using the DEFINE FIELD
statement using the FLEXIBLE
clause. This allows you to have schemaless
functionality in a schemafull
table. For example
-- Define a flexible field in a schemafull table.
DEFINE FIELD interactions ON TABLE user FLEXIBLE;
In the example above, you have added a flexible field interactions
to the user
table. This field can store any type of data, making it schemaless within a schemafull table. For example, you can store chat logs, emails, phone call records, or any other unstructured data in this field as a JSON object.
-- Add an interaction to the user table.
UPDATE user:wd99oovq358zfdmajnt7 CONTENT {
firstName: 'John',
lastName: 'Doe',
email: 'JohnDoe@someemail.com',
interactions: {
type: 'email',
subject: 'Welcome to our platform',
body: 'Thank you for joining our platform. We hope you enjoy your experience.',
date: '2024-05-30',
},
};
In the example above, you have added an interaction to the user
table using the UPDATE
statement. The interactions
field stores an email interaction with the user, including the type, subject, body, and date.
Use Case: Customer Relationship Management (CRM) System
A Customer Relationship Management (CRM) system is a prime example of an application that can benefit from both schemafull and schemaless tables.
In this system, schemafull tables are essential for storing structured and consistent data such as customer details, orders, and products.
For instance, a table for customers would include fields like customer ID, first name, last name, email, phone number, and the date they joined, ensuring data integrity and facilitating reliable reporting and analysis.
Similarly, tables for orders and products would maintain strict schemas to track orders accurately and manage product inventories effectively.
On the other hand, schemaless tables offer flexibility for handling unstructured or semi-structured data, which can vary widely. This is particularly useful for storing customer interactions such as emails, chat logs, phone call records, and social media messages, as well as customer feedback and reviews.
These types of data do not fit neatly into a rigid schema due to their diverse formats and content.
By utilising both schemafull and schemaless tables, a CRM system can achieve a balance between maintaining reliable, structured data for critical operations and providing the flexibility to capture and analyse a wide range of customer-related information, enhancing the system's overall adaptability and functionality.
Conclusion
A schema is the structure of tables in your database. The most important consideration when deciding whether to use a schemafull, schemaless, or both approaches is how flexible you want the content to be. Choose schemafull tables for structured data with strict validation requirements. Use schemaless tables for flexible, dynamic data that might evolve.
Combine both approaches to leverage the strengths of each based on your application's needs.
With respect to Performance Optimisation, you can use the DEFINE INDEX
statement to create indexes on either form of table. This can also optimise storage and retrieval.
Learn more about setting up a schema in the DEFINE
statement documentation.