Pipeline monitoring
As data scientists, the ability to comprehensively understand and monitor machine learning executions is paramount for delivering successful and impactful models.
This page is a resource that equips data scientists with the necessary knowledge and tools to track, monitor, and analyze the executions of their machine learning models on the Craft AI platform.
This page delves into essential topics, including obtaining execution details, tracking input and output, accessing metrics and logs, and comparing multiple executions. By mastering these techniques, data scientists can make informed decisions, optimize their models, and unlock the true potential of the MLOps platform.
How to find an execution and obtain the details ?
When using the platform for experimentation or production, you can find the list of all your executions on the Execution > Execution Tracking page (remember to select a project first).
On this page you will find the list of all executions in all environments of the selected project. All executions are listed, whether in progress, failed or succeeded, whether triggered by a run, endpoint or CRON.
Warning
Please note that deleting the pipeline or deployment will delete all attached executions.
Get general information on an execution
Once you are in the execution tracking page, you need to select an environment using the selector at the top left. Once the mouse is over the environment, you will see another popup on the right with two lists in a row:
- The first contains the list of pipelines that have 'run' attached to them.
- The second contains the list of deployments that have executions attached to them.
Once a pipeline or deployment has been selected, the list of executions appears in the left-hand column, from the most recent to the oldest.
Tip
You can click on an environment directly to get all the associated executions.
Info
You can also retrieve all this information using the sdk.get_pipeline_execution(execution_id)
function via the SDK.
Track input and output of execution
If you want to see the inputs and outputs of a execution, you can view them in the tab of the same name. The inputs/outputs of the pipeline are displayed in a table with their:
- Name
- Type
- Source/destination type (where the value entered for this execution comes from)
- Source/destination value (what is the value entered for this execution)
Info
For the SDK, this information can be obtained using the function mentioned above sdk.get_pipeline_execution(execution_id)
.
More information can be obtained using the:
sdk.get_pipeline_execution_input(execution_id, input_name)
sdk.get_pipeline_execution_output(execution_id, output_name)
Get metrics and logs of execution
In the metrics tab, you can retrieve the pipeline metrics if you have defined them in your code.
Note that the 'simple metrics' are shown in a table, but the 'lists metrics' are shown with graphs so that you can see how they change during execution. For example, here we follow the evolution of loss and accuracy over the epochs of our model training.
Info
It is also possible to retrieve this information from the SDK using the functions sdk.get_metrics(name, pipeline_name, deployment_name, execution_id)
and sdk.get_list_metrics(name, pipeline_name, deployment_name, execution_id)
, more information here.
Finally, the execution logs are also available in the associate tab. Note that the logs, like the other information, are not automatically reflected in the web interface, hence the buttons with arrows for refreshing the page.
Info
Here again, an SDK function is available with sdk.get_pipeline_execution_logs(execution_id, from_datetime, to_datetime, limit)
.
How to compare multiple executions?
Compare execution with a table of comparison
Let's go back to the source code of the pipeline from the beginning. This code is designed to train a deep learning model. This model has three hyper-parameters (the learning rate, the number of epochs, and the batch-size) which are associated with inputs to the pipeline.
We can also see that in the pipeline code, there are simple metrics and lists to track performance during and at the end of training.
Here, we'll vary the hyperparameters over several training sessions to find the best values. We'll vary the learning rate and batch size with these values:
Learning rate | Number of epochs | Batch size |
---|---|---|
0.01 | 10 | 32 |
0.01 | 10 | 64 |
0.01 | 10 | 128 |
0.001 | 10 | 32 |
0.001 | 10 | 64 |
0.001 | 10 | 128 |
0.0001 | 10 | 32 |
0.0001 | 10 | 64 |
0.0001 | 10 | 128 |
Each line in this table represents an execution with its hyper-parameters and therefore a training run of the model.
Instead of looking at the executions one by one in execution tracking, we go to Executions > Execution Comparison. Then select its environment, to finally see the table with all the executions.
There are 3 important elements on this page:
- The table, the central element of the page. In this table, each row represents an execution, except for the first row, which is the column header. Each column represents information about the executions.
- These selectors can be used to add more or less information to the table, allowing inputs, metrics, etc. to be displayed or not. The more information you select, the more columns the table will have.
- Another tab is available on this page for viewing the metrics lists, but we'll come back to this later.
To start with, I'm going to select the meta-data
, inputs
, simple metrics
, and list metrics
. We don't need the outputs
in this case. If I want, I can even select precisely the inputs and metrics I've used in my executions.
Then, in the header of the table, I'll filter the pipeline
names so that I only have the executions from the pipeline I've used.
Note
All filter settings are available from the Filters button at the top right of the screen.
Finally, I'll sort according to precision
by clicking on the little arrow in the column header.
That's it, I've sorted my executions to find the parameters that give me the best accuracy for my model. In my case, the best result is obtained with a learning rate of 0.001 and a batch size of 128.
We could also have sorted according to metric lists, with the difference that you have to select your calculation mode before sorting. In fact, since we're just displaying a number representing the list (with the average, the last number, the minimum, etc.), you do this by clicking on the tag at the top of the column (here with last
in the screenshot below):
Compare the list metrics of several executions
If you want to see all the values in the execution lists, you can also display the list metrics in their entirety to compare them between executions. To do this, select the executions you want to view using the eye on the left of the table, then go to the visualize tab (top right).
Note
Only list metrics executions have a selectable eye.
On this screen, there is a graph for each available metrics list, and each execution is represented by a color. So you can compare the evolution of your metrics between each training session.
Info
You can hide executions by clicking on their names in the legend.
How to follow a pipeline in production?
Pipeline monitoring
Let's take the example of a classic prediction Machine Learning model. Once your model has been trained and selected, we're going to expose it in an endpoint so that it can be used from any application. To do this, we're already going to need source code for an inference pipeline, this is the 2nd Python code given at the beginning. Note that this code reuses the validation dataset, to do the inference, for simplicity. We can, therefore, score each prediction and put the result in a score metric:
- 1: The prediction is accurate
- 0: The prediction is false
We create the pipeline and the endpoint deployment with the associated input/output. Your model is now ready to make predictions. For each prediction, we can track executions using the tools we've already seen.
In addition, you can also monitor your executions more globally over time by going to Monitoring > Pipeline metrics. On this page, you can see the evolution of the single metrics over time for any selected deployment.
In our case, we can track, execution after execution, the prediction score (true or false) of the model: