Pipelines
A pipeline is a machine learning workflow. Like a regular function, a pipeline is defined by the input it ingests, the code it runs, and the output it returns.
đź’ˇ The pipelines are written in Python with SDK calls and are executed on a Kubernetes pod. This allows easy modification with any code editor, and to have isolation between runs and other processes running on the Platform.
The main objectives of the pipelines are :
- Orchestrating end-to-end ML workflows
- Collaboratively managing, tracking and viewing pipeline definitions.
- Deploying in production in few clicks with several methods (endpoint, scheduler, …)
- Enabling large scale production of Python code
- Ensuring an efficient use of compute resources
Example :
🖊️ Training pipeline
- Data preparation and preprocessing: In this stage, raw data is collected, cleaned, and transformed into a format that is suitable for training a machine learning model. This may involve tasks such as filtering out missing or invalid data points, normalizing numerical values, and encoding categorical variables.
- Model training: In this stage, a machine learning model is trained on a prepared dataset. This may involve selecting a model type, tuning hyperparameters, and training the model using an optimization algorithm.
- Model evaluation: Once the model has been trained, it is important to evaluate its performance to determine how well it generalizes to new data. This may involve tasks such as splitting the dataset into a training set and a test set, evaluating the model’s performance on the test set, and comparing the results to a baseline model.
🖊️ Inference pipeline
- Model loading: In this stage, a trained machine learning model is loaded from storage and prepared for use. This may involve tasks such as loading the model’s weights and any associated dependencies.
- Data preparation: In this stage, incoming data is cleaned and transformed into a format that is suitable for the model. This may involve tasks such as normalizing numerical values and encoding categorical variables.
- Inference: In this final stage, the model is used to make predictions on the prepared data. This may involve tasks such as passing the data through the model and processing the output to generate a final prediction.