Skip to main content

SurrealML

SurrealML is an engine that seeks to do one thing, and one thing well: store and execute trained ML models. SurrealML does not intrude on the training frameworks that are already out there, instead works with them to ease the storage, loading, and execution of models. Someone using SurrealML will be able to train their model in a chosen framework in Python, save their model, and load and execute the model in either Python or Rust. The inference engine is composed of python bindings interacting with a core written in Rust. This means that the exact same code that the Python client calls will be running on a production node by itself, or in the database. While the SurrealML engine runs in the database, it is developed in a completely isolated Github repository, giving the user 100% freedom on how they deploy and interact with the SurrealML engine.

While this is all exciting, let us check to see if SurrealML works on your machine. There is nothing worse than reading about a package only to find that it does not work. The first move is to install the package.

Installation

To install SurrealML, make sure you have Python installed. Then, install the SurrealML library and either PyTorch or sklearn, based on your model choice. You can install the package with both PyTorch and SKLearn with the command below:

pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[sklearn,torch]"

If you want to use SurrealML with sklearn you will need the following installation:

pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[sklearn]"

For PyTorch:

pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[torch]"

Once the package is installed, you can then train and save your first model using sklearn.

Quick Start with Sklearn

Sklearn models can also be converted and stored in SurrealML’s .surml format enabling developers to load them in any Python version as we are not relying on pickle. Metadata in the file also enables other users of the model to use them out of the box without having to worry about the normalisation of the data or getting the right inputs in order. We will cover .surml files in more depth in the storage section. You will also be able to load your Sklearn models in Rust and run them, meaning you can use them in your SurrealDB server.

Before we start writing any training code we need to import the following:

from sklearn.linear_model import LinearRegression
from surrealml import SurMlFile, Engine
from surrealml.model_templates.datasets.house_linear import HOUSE_LINEAR

Here we can see that we have imported the standard sklearn LinearRegression model. We then import the SurMlFile object which will facilitate the saving, loading, and execution of the trained model. We will use the Engine enum to tell our SurMlFile object if we are using sklearn or torch. We then finally import a small example dataset called HOUSE_LINEAR. This example dataset is a simple linear correlation between house prices, the square foot of the house, and the number of floors. This dataset is also used in the CI testing pipeline when we push updates.

Now that we have imported everything that we need, we can train our model with the code below:

model = LinearRegression()
model.fit(HOUSE_LINEAR["inputs"], HOUSE_LINEAR["outputs"])

This will give us a trained model. Now we need to save it and this is where SurrealML comes in. First we declare a SurMlFile object instance with the inputs, name, model object, and engine with the following code:

file = SurMlFile(
model=model,
name="house-price-prediction",
inputs=HOUSE_LINEAR["inputs"],
engine=Engine.SKLEARN
)

file.add_version(version="0.0.1")

The next step is optional, but it would be nice to map our inputs to some keys. We must be careful with the order that we declare our columns as they need to be mapped with the order or inputs from the vector that our model was trained on. If you click on HOUSE_LINEAR you will see the following declaration:

HOUSE_LINEAR = {
"inputs": inputs,
"outputs": house_price,

"squarefoot": squarefoot,
"num_floors": num_floors,
"input order": ["squarefoot", "num_floors"],
"raw_inputs": {
"squarefoot": raw_squarefoot,
"num_floors": raw_num_floors,
},
"normalised_inputs": {
"squarefoot": squarefoot,
"num_floors": num_floors,
},
"normalisers": {
"squarefoot": {
"type": "z_score",
"mean": squarefoot.mean(),
"std": squarefoot.std()
},
"num_floors": {
"type": "z_score",
"mean": num_floors.mean(),
"std": num_floors.std()
}
},
}

Here we can see that there are some normalisers involved. We can also see that the input order for the model training was ["squarefoot", "num_floors"] . Therefore, we can add the column names to our SurMlFile object instance with the code below:

file.add_column("squarefoot")
file.add_column("num_floors")

The add_column was the only order that we have to be careful about. We need to add our normalisers to the SurMlFile object instance with the code below but we do not have to worry about the order as the normalisers by default will be mapped to the columns:

file.add_normaliser(
"squarefoot",
"z_score",
HOUSE_LINEAR["squarefoot"].mean(),
HOUSE_LINEAR["squarefoot"].std()
)

file.add_normaliser(
"num_floors",
"z_score",
HOUSE_LINEAR["num_floors"].mean(),
HOUSE_LINEAR["num_floors"].std()
)

We are nearly done with adding metadata, with just one item left to add: the output that the model is trying to predict. This can be achieved by the following code:

file.add_output(
"house_price",
"z_score",
HOUSE_LINEAR["outputs"].mean(),
HOUSE_LINEAR["outputs"].std()
)

And now our file is ready to be saved which is done with the code below:

file.save(path="./linear.surml")

The file is stored in the .surml format meaning that there is a header with the data that we defined, and the weights are stored in the ONNX format. This means that there is zero language dependent dependencies. We are now ready to load and perform calculations on our model. We can load our model with the following code:

new_file = SurMlFile.load(path="./linear.surml", engine=Engine.SKLEARN)

If you are confident in what you are doing at this point, you can choose to perform calculations through surrealML using a raw compute in which the raw vector of inputs is directly passed into the model with the code below:

print(new_file.raw_compute(input_vector=[5, 6]))

However, if you want the normalisation to be automatically applied, and inputs mapped via keys, we can use a buffered compute with the following code:

print(
new_file.buffered_compute(
value_map={
"squarefoot": 5,
"num_floors": 6
}
)
)

Both types of executions are executing the ML model using the Rust engine under the hood. As a result, the exact same code will be running your model in SurrealDB or your inference server regardless of whether you choose to build a server in Python or Rust. This .surml file can be also be loaded by either Rust or any Python version that has surrealML and execute inference inference installed.

Now that we are able to load and execute our model locally, how do we deploy our model onto SurrealDB and run it? In the next section, we cover uploading.

Model Deployment

Before we try and upload our model, we need to have a node running. We can do with the docker-compose.yml file below:

version: '3'
services:
surrealdb:
image: surrealdb/surrealdb
command: start
environment:
- SURREAL_USER=root
- SURREAL_PASS=root
- SURREAL_LOG=trace
ports:
- 8000:8000

Once our node is running via docker, we can then upload our trained model with the following code:

url = "http://0.0.0.0:8000/ml/import"
SurMlFile.upload(
path="./linear.surml",
url=url,
chunk_size=36864,
namespace="test",
database="test",
username="root",
password="root"
)

The upload function will chunk the model and stream it up to a SurrealDB node. We can then perform an execution of the model with the following SurrealQL function:

ml::house-price-prediction<0.0.1>({
squarefoot: 500.0,
num_floors: 2.0
})

Here, house-price-prediction is the name of the model. The <0.0.1> is the version of the model. The SurrealQL function above will give us a model output from the inputs that we passed in.

We can now explore how our model can interact with other data with the SurrealQL script below:

CREATE house_listing SET squarefoot_col = 500.0, num_floors_col = 1.0;
CREATE house_listing SET squarefoot_col = 1000.0, num_floors_col = 2.0;
CREATE house_listing SET squarefoot_col = 1500.0, num_floors_col = 3.0;

SELECT * FROM (
SELECT *,
ml::house-price-prediction<0.0.1>({
squarefoot: squarefoot_col,
num_floors: num_floors_col
}) AS price_prediction
FROM house_listing
)
WHERE price_prediction > 177206.21875;

What is happening here is that we are feeding the columns from the table house_listing into a model we uploaded called house-price-prediction with a version of 0.0.1. We then get the results of that trained ML model as the column price_prediction. We then use the calculated predictions to filter the rows giving us the following result:

[
{
"id": "house_listing:7bo0f35tl4hpx5bymq5d",
"num_floors_col": 3,
"price_prediction": 406534.75,
"squarefoot_col": 1500
},
{
"id": "house_listing:8k2ttvhp2vh8v7skwyie",
"num_floors_col": 2,
"price_prediction": 291870.5,
"squarefoot_col": 1000
}
]

Having covered everything that we need to get up and running with SurrealML, we should explore some other concepts in more depth to get the most out of SurrealML and be able to troubleshoot problems.