SurrealML
SurrealML is an engine that seeks to do one thing, and one thing well: store and execute trained ML models. SurrealML does not intrude on the training frameworks that are already out there, instead works with them to ease the storage, loading, and execution of models. Someone using SurrealML will be able to train their model in a chosen framework in Python, save their model, and load and execute the model in either Python or Rust. The inference engine is composed of python bindings interacting with a core written in Rust. This means that the exact same code that the Python client calls will be running on a production node by itself, or in the database. While the SurrealML engine runs in the database, it is developed in a completely isolated Github repository, giving the user 100% freedom on how they deploy and interact with the SurrealML engine.
While this is all exciting, let us check to see if SurrealML works on your machine. There is nothing worse than reading about a package only to find that it does not work. The first move is to install the package.
Installation
To install SurrealML, make sure you have Python installed. Then, install the SurrealML
library and either PyTorch
or sklearn
, based on your model choice. You can install the package with both PyTorch
and SKLearn
with the command below:
pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[sklearn,torch]"
If you want to use SurrealML
with sklearn
you will need the following installation:
pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[sklearn]"
For PyTorch
:
pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[torch]"
Once the package is installed, you can then train and save your first model using sklearn.
Quick Start with Sklearn
Sklearn models can also be converted and stored in SurrealML’s .surml
format enabling developers to load them in any Python version as we are not relying on pickle
. Metadata in the file also enables other users of the model to use them out of the box without having to worry about the normalisation of the data or getting the right inputs in order. We will cover .surml
files in more depth in the storage section. You will also be able to load your Sklearn models in Rust and run them, meaning you can use them in your SurrealDB server.
Before we start writing any training code we need to import the following:
from sklearn.linear_model import LinearRegression
from surrealml import SurMlFile, Engine
from surrealml.model_templates.datasets.house_linear import HOUSE_LINEAR
Here we can see that we have imported the standard sklearn
LinearRegression
model. We then import the SurMlFile
object which will facilitate the saving, loading, and execution of the trained model. We will use the Engine
enum to tell our SurMlFile
object if we are using sklearn
or torch
. We then finally import a small example dataset called HOUSE_LINEAR
. This example dataset is a simple linear correlation between house prices, the square foot of the house, and the number of floors. This dataset is also used in the CI testing pipeline when we push updates.
Now that we have imported everything that we need, we can train our model with the code below:
model = LinearRegression()
model.fit(HOUSE_LINEAR["inputs"], HOUSE_LINEAR["outputs"])
This will give us a trained model. Now we need to save it and this is where SurrealML
comes in. First we declare a SurMlFile
object instance with the inputs, name, model object, and engine with the following code:
file = SurMlFile(
model=model,
name="house-price-prediction",
inputs=HOUSE_LINEAR["inputs"],
engine=Engine.SKLEARN
)
file.add_version(version="0.0.1")
The next step is optional, but it would be nice to map our inputs to some keys. We must be careful with the order that we declare our columns as they need to be mapped with the order or inputs from the vector that our model was trained on. If you click on HOUSE_LINEAR
you will see the following declaration:
HOUSE_LINEAR = {
"inputs": inputs,
"outputs": house_price,
"squarefoot": squarefoot,
"num_floors": num_floors,
"input order": ["squarefoot", "num_floors"],
"raw_inputs": {
"squarefoot": raw_squarefoot,
"num_floors": raw_num_floors,
},
"normalised_inputs": {
"squarefoot": squarefoot,
"num_floors": num_floors,
},
"normalisers": {
"squarefoot": {
"type": "z_score",
"mean": squarefoot.mean(),
"std": squarefoot.std()
},
"num_floors": {
"type": "z_score",
"mean": num_floors.mean(),
"std": num_floors.std()
}
},
}
Here we can see that there are some normalisers involved. We can also see that the input order for the model training was ["squarefoot", "num_floors"]
. Therefore, we can add the column names to our SurMlFile
object instance with the code below:
file.add_column("squarefoot")
file.add_column("num_floors")
The add_column
was the only order that we have to be careful about. We need to add our normalisers to the SurMlFile
object instance with the code below but we do not have to worry about the order as the normalisers by default will be mapped to the columns:
file.add_normaliser(
"squarefoot",
"z_score",
HOUSE_LINEAR["squarefoot"].mean(),
HOUSE_LINEAR["squarefoot"].std()
)
file.add_normaliser(
"num_floors",
"z_score",
HOUSE_LINEAR["num_floors"].mean(),
HOUSE_LINEAR["num_floors"].std()
)
We are nearly done with adding metadata, with just one item left to add: the output that the model is trying to predict. This can be achieved by the following code:
file.add_output(
"house_price",
"z_score",
HOUSE_LINEAR["outputs"].mean(),
HOUSE_LINEAR["outputs"].std()
)
And now our file is ready to be saved which is done with the code below:
file.save(path="./linear.surml")
The file is stored in the .surml
format meaning that there is a header with the data that we defined, and the weights are stored in the ONNX
format. This means that there is zero language dependent dependencies. We are now ready to load and perform calculations on our model. We can load our model with the following code:
new_file = SurMlFile.load(path="./linear.surml", engine=Engine.SKLEARN)
If you are confident in what you are doing at this point, you can choose to perform calculations through surrealML
using a raw compute in which the raw vector of inputs is directly passed into the model with the code below:
print(new_file.raw_compute(input_vector=[5, 6]))
However, if you want the normalisation to be automatically applied, and inputs mapped via keys, we can use a buffered compute with the following code:
print(
new_file.buffered_compute(
value_map={
"squarefoot": 5,
"num_floors": 6
}
)
)
Both types of executions are executing the ML model using the Rust engine under the hood. As a result, the exact same code will be running your model in SurrealDB or your inference server regardless of whether you choose to build a server in Python or Rust. This .surml
file can be also be loaded by either Rust or any Python version that has surrealML and execute inference inference installed.
Now that we are able to load and execute our model locally, how do we deploy our model onto SurrealDB and run it? In the next section, we cover uploading.
Model Deployment
Before we try and upload our model, we need to have a node running. We can do with the docker-compose.yml
file below:
version: '3'
services:
surrealdb:
image: surrealdb/surrealdb
command: start
environment:
- SURREAL_USER=root
- SURREAL_PASS=root
- SURREAL_LOG=trace
ports:
- 8000:8000
Once our node is running via docker, we can then upload our trained model with the following code:
url = "http://0.0.0.0:8000/ml/import"
SurMlFile.upload(
path="./linear.surml",
url=url,
chunk_size=36864,
namespace="test",
database="test",
username="root",
password="root"
)
The upload
function will chunk the model and stream it up to a SurrealDB node. We can then perform an execution of the model with the following SurrealQL function:
ml::house-price-prediction<0.0.1>({
squarefoot: 500.0,
num_floors: 2.0
})
Here, house-price-prediction
is the name of the model. The <0.0.1>
is the version of the model. The SurrealQL function above will give us a model output from the inputs that we passed in.
We can now explore how our model can interact with other data with the SurrealQL script below:
CREATE house_listing SET squarefoot_col = 500.0, num_floors_col = 1.0;
CREATE house_listing SET squarefoot_col = 1000.0, num_floors_col = 2.0;
CREATE house_listing SET squarefoot_col = 1500.0, num_floors_col = 3.0;
SELECT * FROM (
SELECT *,
ml::house-price-prediction<0.0.1>({
squarefoot: squarefoot_col,
num_floors: num_floors_col
}) AS price_prediction
FROM house_listing
)
WHERE price_prediction > 177206.21875;
What is happening here is that we are feeding the columns from the table house_listing
into a model we uploaded called house-price-prediction
with a version of 0.0.1
. We then get the results of that trained ML model as the column price_prediction
. We then use the calculated predictions to filter the rows giving us the following result:
[
{
"id": "house_listing:7bo0f35tl4hpx5bymq5d",
"num_floors_col": 3,
"price_prediction": 406534.75,
"squarefoot_col": 1500
},
{
"id": "house_listing:8k2ttvhp2vh8v7skwyie",
"num_floors_col": 2,
"price_prediction": 291870.5,
"squarefoot_col": 1000
}
]
Having covered everything that we need to get up and running with SurrealML, we should explore some other concepts in more depth to get the most out of SurrealML and be able to troubleshoot problems.