3.2 Part 2: Deploy with configuration step
3.2.1 Intoduction
The previous part showed the main concepts of the platform and how to use the basics of the SDK. With what you already know you are able to put into production really simple applications. But in order to build more realistic applications, using more complex code with dependencies such as Python libraries, it is needed to learn more advanced functionnalities and especially how to configure the execution context of a step.
This page will present the same commands as the previous ones going through more available functionalities offered by the platform, with a real Machine Learning use case. We will improve this Machine Learning application later in Part 3 and Part 4.
You can find all the code used in this part and its structure here
Prerequisites
Python 3.8 or higher is required to be installed on your computer.
Have done the previous Part of this tutorial (Part 1: Deploy a simple pipeline).
3.2.2 First ML application
Here we will build a pipeline to train a simple ML model on the iris dataset.
Dataset and use case introduction
The iris dataset describes four features (petal length, petal width, sepal length, sepal width) from three different types of irises (Setosa, Veriscolour, Virginica).
The goal of our application is to classify flower type based on the previous four features. In this part we will start our new application by building a first pipeline to train a simple ML model on the iris dataset and deploy it.
We will use the following code that trains a sklearn KNN classifier on this dataset and make a prediction on the test set.
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
def TrainPredictIris():
iris_X, iris_y = datasets.load_iris(return_X_y=True, as_frame=True)
np.random.seed(0)
indices = np.random.permutation(len(iris_X))
iris_X_train = iris_X.loc[indices[0:90], :]
iris_y_train = iris_y.loc[indices[0:90]]
iris_X_test = iris_X.loc[indices[90:], :]
iris_y_test = iris_y.loc[indices[90:]]
knn = KNeighborsClassifier()
knn.fit(iris_X_train, iris_y_train)
result = knn.predict(iris_X_test)
print(result)
3.2.3 Cleaning object
Before we really start building our new application, we might want to clean the platform from the objects we created in Part 1. To do this, we need to use the functions associated with each object, here the endpoint, the pipeline and the step.
Warning
These objects have dependencies on each other, we have to delete them in a certain order. First the endpoint, then the pipeline and then the step.
sdk.delete_deployment(deployment_name="part-1-hello-world-endpoint")
sdk.delete_pipeline(pipeline_name="part-1-hello-world-pipeline")
sdk.delete_step(step_name="part-1-hello-world-step")
In the rest of this part we will follow the same workflow as in the previous one:

3.2.4 In depth step configuration
Now it is time to use the create_step()
method of craft-ai-sdk
object to create step like before. This time though, we will see how we
can define a bit more the step and its execution context. We are going
to focus on two parameters.
3.2.4.1 Python libraries
As you might have noticed, the code above uses external Python libraries (numpy and scikit learn). In the previous step we conveniently built an application that didn’t require any external dependency but this time if we want this code to work in the platform we have to inform it that this step requires some Python libraries to run properly.
To do so we create a requirements.txt
file at the root of the repo,
containing the list of Python libraries used in our step function:
scikit-learn==1.2.1
numpy==1.19.5
pandas==1.4.2
In this case we placed it at the root of the repo but you can put it whereever you want.
As for the code, the platform only sees what’s on your github repository so don’t forget to push your requirement file on github.
You can now specify the path of this file in the Libraries & Packages section of your project settings using the web interface and all the steps created in this project will have these libraries installed and ready to be used.
3.2.4.2 Included folder
When creating your step, you may not need to include all the files of
your project repositories in your step. You can specify the files and
folder(s) to include from the GitHub repository to prevent the step from
accessing all code available in the repository by default, using the
included_folders
option.
Once again you can set this option in your project settings page in the web interface.
🎉 Now all the steps created in this project will have the relevant libraries installed and only the necessary file will be included

3.2.5 Configuration Hierarchy
When you created your project in the platform (cf Part 0),, you have set up different parameters (like the repository URL, the deploy key or the Python version you are using) and we also set up new parameters in the previous section.
By default, your step will apply those parameters during its creation (that is why you didn’t need to add any parameters when creating your step in Part 1). However, sometimes you want to define them only at the step level and keep the previous ones at the project level. This is the role of the create_step() function’s parameter container_config . You can pass as a dictionary the set of parameters you want to use for the step creation. It allows you to be really specific in the execution context of your step.
💡 For example, if you need to build a step embedding a function from
another repository, you can specify the new Github repository URL and
deploy key at the step level using the container_config
parameter.
Your project parameters will remain unchanged.
Warning
The execution context of a step is non persistent. It means that as soon as the execution is done, everything written in memory and on disk during the execution will be deleted. Therefore, a step is not the right place to store data or any information. We will learn more about persistence in Part 4
Here we will specify the requirements_path
and the
included_folders
.
Note: You can specify them in the user interface at the project level as well. Just remember that they will be used by default if you don’t specify others at the step level.
3.2.5.1 included_folders
3.2.5.2 requirements_path
In order for our requirements.txt
file to be taken into account, we
must also add it to the container_config
parameter with the
requirements_path
key.
sdk.create_step(
step_name="part-2-irisclassifier-step",
function_path="src/part-2-irisModel.py",
function_name="TrainPredictIris",
description="This function creates a classifier model for iris and makes prediction on test data set",
container_config = {
"requirements_path" : "requirements.txt",
"included_folders" : ["src"]
}
)
It may also be useful to describe precisely the steps created to be able
to understand their purpose afterward. To do so, you can fill in the
description
parameter during the step creation.
🎉 Now your step has been created. You can now create your Pipeline.
From here, we reproduce the same steps as before with the creation of the pipeline, the endpoint and the execution of the objects we have created.
3.2.6 Create a pipeline
Create a pipeline with the create_pipeline()
method of the SDK.
sdk.create_pipeline(
pipeline_name="part-2-irisclassifier-pipeline",
step_name="part-2-irisclassifier-step"
)
3.2.7 Deploy your Pipeline through an Endpoint
Now you can deploy your pipeline with an endpoint and execute it with an HTTP call as in Part 1.
Note : Here we get the execution ID
directly from the endpoint return call instead of using
sdk.list_pipeline_executions()
.
endpoint = sdk.create_deployment(
pipeline_name="part-2-irisclassifier-pipeline",
deployment_name_name="part-2-irisclassifier-endpoint",
execution_rule="endpoint"
)
import requests
endpoint_URL = sdk.base_environment_url + "/endpoints/" + endpoint["name"]
headers = {"Authorization": "EndpointToken " + endpoint["endpoint_token"]}
request = requests.post(endpoint_URL, headers=headers)
request.status_code
pipeline_executions = sdk.list_pipeline_executions(pipeline_name="part-2-irisclassifier-pipeline")
pipeline_executions
logs = sdk.get_pipeline_execution_logs(pipeline_name="part-2-irisclassifier-pipeline",
execution_id=pipeline_executions[-1]['execution_id'])
print('\n'.join(log["message"] for log in logs))
The output is a list of iris categories :
>> [2 0 2 0 2 2 0 0 2 0 0 0 1 2 2 0 0 0 1 1 0 0 1 0 2 1 2 1 0 2 0 2 0 0 2 0 2
1 1 1 2 2 2 1 0 1 2 2 0 1 1 2 1 0 0 0 2 1 2 0]
Now that we can have more complex code in our steps and we know how to parametrize the execution context of our steps, we would like to be able to give it input elements to vary the result and receive the result easily. For this, we can use the input/output feature offered by the platform.
Next step: Part 3: Deploy with input and output