Skip to content

Connect a Git repository to the platform

The basic method to retrieve the source code of a pipeline is by specifying a local file that contains the script in .py format. However, it is also possible to retrieve the script directly from a Git repository (GitHub or GitLab). Connecting the Git repository to your platform offers several advantages over using local files.

This documentation explains the benefits of this integration and the steps required to set up this connection securely and efficiently.

Why connect a Git repository?

Limitations of using local files

Typically, the platform is used solely with local files for sharing the source code of pipelines. This approach has several drawbacks:

  • Lack of versioning: It is difficult to track changes made to the code.
  • Limited sharing: Sharing code with other team members is complex and inconvenient.
  • Poor practice for production: Working with local files is not recommended for production environments due to the lack of controls and security measures.

Proposed solution

To overcome these limitations, we propose reading the code directly from a Git repository. This allows for easier versioning and code sharing. This practice is strongly encouraged for production projects, but not exclusively.

To achieve this:

  • The user must create a deployment key.
  • The public key must be added to the Git repository.
  • The private key must be shared with the platform so it can access the repository.

Let's see how to do this in detail.

How to connect Git repository ?

Providing information to the Git repository

To allow the Git repository to recognize and authorize connections from the platform, it is necessary to provide an SSH key pair:

Step 1 : Generating the deploy key

For security reasons, to get access to your Git repository, the platform uses a Deploy Key with the RSA SSH KEY standard. The deploy key is a special key that grants access to a specific repository; it is not the same as personal keys used commonly by users to access their repositories, although they are both SSH keys.

The deploy key has two elements:

  • The public key, which must be set in the GitHub administration settings for the repository.
  • The private key, which must be sent to the Craft AI MLOps Platform, so it can access the repository.

First, you will need to generate an SSH key on your computer:

On Linux and macOS

Move to a new directory and run the following command, by replacing my-key-filename by a name of your choosing.

ssh-keygen -m PEM -t rsa -b 4096 -f my-key-filename -q -N "" -C ""

This will generate two files: a file named my-key-filename with the “private key” used for creating a pipeline, and a file named my-key-filename.pub with the “public key” used to create the Deploy Key on GitHub.

Warning

These files should not be included in the pipeline’s directory. Only the type of SSH key generated by this command is accepted when creating the pipeline.

On window
  • Check that OpenSSH is installed and install it if it is not the case. (Go in Settings > Apps & features > Optional feature to get list of features and click on add feature to find and install OpenSSH)

  • Press the Windows key.

  • Type cmd.

  • Under Best Match, right-click Command Prompt.

  • Click Run as Administrator.

  • If prompted, click Yes in the ``Do you want to allow this app to make changes to your device?`` pop-up.

  • In the command prompt, type the following : ssh-keygen

  • By default, the system will save the keys to C:Usersyour_username/.ssh/id_rsa. You can use the default name, or you can choose more descriptive names. This can help distinguish between keys, if you are using multiple key pairs. To stick to the default option, press Enter.

  • You’ll be asked to enter a passphrase. Hit Enter to skip this step.

  • The system will generate the key pair, and display the key fingerprint and a randomart image.

  • Open your file browser.

  • Navigate to C:Usersyour_username/.ssh.

  • You should see two files. The identification is saved in the id_rsa file and the public key is labeled id_rsa.pub. This is your SSH key pair.

Step 2 : Adding the public key to the Git repository

Now, you have to add the public key to the Git repository settings as a deployment key.

For GitHub :

  1. Head to the homepage of your repository on GitHub.

  2. Go to the Settings page.

  3. Once there, select the tab on the left named Deploy Keys

  4. Select Add deploy key on the Deploy Keys page.

    deploy_key

  5. Insert the name you want for your deploy key

  6. Copy/paste the public key (content of *your-key-filename*.pub) in the second text box.

  7. Click on “Add key” (you don’t need to allow write access)

For GitLab :

  1. Head to the homepage of your repository on GitLab.
  2. Click on Settings (left bar) then go to Repository
  3. Click on Expand in the Deploy keys section
  4. Insert the name you want for your deploy key.
  5. Copy/paste the public key (content of *your-key-filename*.pub) in the second text box.
  6. Click on Add key (you don’t need to Grant write permissions to this key)

Providing Information to the platform

There are two methods to integrate the SSH key on the platform side:

  • In the pipeline creation information
  • In the project information

Step 3a : During pipeline creation

Parameters can be configured during each pipeline creation just like with a local folder. To do this, you need to add the necessary information (repository URL, branch, private key) directly into the container configuration.

Let's take this file structure as an example:

.
├── sdk-platform-script.py
├── my-git-repo/
│   ├── README.md
│   ├── requirements.txt 
│   └── src/
│       └── my-source-code.py
└── keys/
    ├── my-key-filename
    └── my-key-filename.pub

The script sdk_platform_script.py is responsible for creating platform objects using the SDK. The file my_source_code.py contains the Python code that will be executed in the pipeline, and the keys directory contains the previously generated keys.

When creating a pipeline, you can configure your SDK script in this way to create a pipeline from the code in the Git repo:

sdk-platform-script.py
with open('keys/my-key-filename', 'r') as file:
    private_key_value = file.read().rstrip()


sdk.create_pipeline(
    function_path="src/my-source-code.py",
    function_name="my-function-name", 
    pipeline_name="my-pipeline-name",
    container_config = {
        "repository_url": "git@github.com:my-account/my-git-repo.git",
        "repository_deploy_key": private_key_value,
        "repository_branch": "main"
    }
)

Step 3b : In project information

Alternatively, you can provide the repository information in the project settings. Once configured, if no additional information is provided during the pipeline creation, it will use the default project settings.

To do this, go to your project's settings page and enter these 3 parameters:

  1. Repository URL : Enter the SSH URL of your repository.

    How to get my repository URL

    On GitHub, from your repository, click on the “Code” button and choose SSH. The URL must start with “git@”.

    project creation screen

    On GitLab, from your repository, click on the "Clone" button and copy SSH URL. The URL must start with “git@”.

  2. Deploy key : Enter your Github / GitLab private key.

    Warning

    Remember to keep the begin and end tags when you copy/paste the key. It should look like this :

    -----BEGIN RSA PRIVATE KEY-----
    MIIJKQIBFSKCAgEAwH/zbeYm3M7elJHIjQTiO2+2QdTOh3ebvZotNQNATJ4UIqVN
    T9P2xN3Xd/27w8/jv9wmGqHzSVyEo53FfnyDm2zlFvqImRZm3znujA9bbp00itB5
    ...
    Bo1gJMJxYJ4npi+0VULc33Ao6FzOfGxSACoTA/gG/q7LHO68c6Zgz+dI/ekDqG7C
    Gx52WhCP26GdneD/EhPgcUh41FzDbgO2BBIboNnrJLQzSQboK8JNrsislPr7
    -----END RSA PRIVATE KEY-----
    

  3. Default branch : Enter the Git branch you want as default for this project. If this field is empty, we will use the default Git branch. It will be possible to choose a different default branch within an environment.

Once this project information is saved, you can set up your pipelines so that the platform will use your Git repository by default:

sdk.create_pipeline(
    function_path="src/my-source-code.py",
    function_name="my-function-name", 
    pipeline_name="my-pipeline-name"
)

Note

In the case where the Git repo information has been defined in the environment settings page as well as in the pipeline creation function, the information defined in the pipeline creation function takes priority.