Create a pipeline with a custom Docker config

Info

For any additional information on the SDK functions, please check the SDK Documentation.
For any additional information on Docker & DockerFiles, please check the Docker Documentation.
This page is presenting a very simple implementation that you can adapt with every other feature of the Platform, except the ones that are explicitely specified as not working
The use of DockerFiles to create pipelines is restricted to the Elastic Deployment mode. The low-latency is not supported with this feature.

Warning

If you do not have exprience with Docker or a peer that has experience with Docker, please do not use it as it is a very advanced one.
If this feature is mandatory and you have troubles using it, feel free to contact us for some assistance.

Docker is a Platform-as-a-Service procut that allows you to create, deploy, and run exploitation systems and programs in containers. Containers are a lightweight alternative to virtual machines that provide a way to package and isolate your application's dependencies and configuration into a single unit. With Docker, you can easily move your application between environments, from development to production, without worrying about dependencies or compatibility issues.

In the context of MLOps or LLMOps, Docker is particularly useful because it allows you to create custom containers for running your machine learning code. This is important because machine-learning models often have complex dependencies that can be difficult to manage. By using Docker, you can create a container with all the necessary dependencies pre-installed, ensuring that your code runs consistently across different configurations.

In our Platform, you can provide a custom DockerFile to configure your pipeline's container (which will contain the pipeline execution). This gives you complete control over the container, allowing you to install any necessary libraries or dependencies, and ensure that your code runs consistently and reproducibly. By using Docker in this way, you can you can create more complex pipelines that require specific configuration, and ensure that your machine-learning models are always running in the correct container configuration for that pipeline.

Create DockerFile

DockerFile & Platform

To configure the pipeline environment, we use a DockerFile where we specify the pipelines that will lead to the execution of the code. The pipeline we will have in the DockerFile are :

Use the base image of Python
Define the working folder ('app' folder is mandatory here)
Install the system dependencies and pip if needed
Copy the repo files into the docker container
Run the script

FROM python:3.9-slim

# Workdir always need to be config at /app
ENV ROOT_DIR /app
WORKDIR ${ROOT_DIR}

# Install system dependencies
RUN apt-get install dependencies

# Install Python dependencies
RUN pip install dependencies

# copy context (repo root here) inside docker container
COPY . .

# Lanch your Python script
ENTRYPOINT [ "python", "src/otherCode.py" ]

Adapt pipeline code

Explanation of pipeline code changes

In the code of the pipelines we start with DockerFiles, we need to :

Call the function you want to start directly in the script.
If you want to define input, you will get it in "sys.argv" object, you don't need to define anything in the DockerFile.
If you want to define output, you need to write the output files in the "app" folder with the prefix "output-".

The script is called with two inputs arguments that are serialized JSON:

The first contains the input values for the non-file inputs (the name of the input as key and its value as value).
The second contains the path to the file inputs (the name of the input as key and the object containing the path as value).

import json
import sys
import os


def my_function (number, file) :

    with open(file["path"]) as f:
        contents = f.readlines()

    # Your code here

    return ["Return pipeline array", "output elem", "other elem"]

if __name__ == "__main__":

    # Pre-processing: Inputs
    parameters_inputs = json.loads(sys.argv[1])
    files_inputs = json.loads(sys.argv[2])

    # Call function
    result = my_function(**parameters_inputs, **files_inputs)

    # Post-processing: Outputs
    output_dir = '/app'
    with open(os.path.join(output_dir, 'output-result_var'), 'w+') as f:
        json.dump(result, f)

    with open(os.path.join(output_dir, 'output-result_file'), 'w') as f:
        f.write('\n'.join(f'* {item}' for item in result))

Note

If you want to send or return a file by endpoint, you can only send or return 1 file in it. The mapping of other input and output must be done with something else.

Create a pipeline

Before creating a pipeline, you can add input and output as a classic pipeline, more information here To create pipelines with your DockerFile, you need to call the create_pipeline() function with just the name of the pipeline and the path to the DockerFile inside your repo. You don't have to specify the exact path to the code or function name because it will be taken from the DockerFile.

sdk.create_pipeline(
    pipeline_name="*your-custom-pipeline-name*",
    [description="*text with limit*"],
    [container_config = {
        [repository_url="*your-git-url*"],
        [repository_branch="*your-git-branch* or *your-git-tag*"],
        [repository_deploy_key="*your-private_key*"],
        [included_folders=["*your-folder-path*", ...]],
        [local_folder="*your-local-path*"],
        dockerfile_path="*your-dockerfile-path*",
    inputs=[Input(...)],
    outputs=[Output(...)],
}],
)

You can then use and deploy your pipeline as usual.