Git integration for Azure Machine Learning
Git is a popular version control system that allows you to share and collaborate on your projects.
Azure Machine Learning fully supports Git repositories for tracking work - you can clone repositories directly onto your shared workspace file system, use Git on your local workstation, or use Git from a CI/CD pipeline.
When submitting a job to Azure Machine Learning, if source files are stored in a local git repository then information about the repo is tracked as part of the training process.
Since Azure Machine Learning tracks information from a local git repo, it isn't tied to any specific central repository. Your repository can be cloned from GitHub, GitLab, Bitbucket, Azure DevOps, or any other git-compatible service.
Clone Git repositories into your workspace file system
Azure Machine Learning provides a shared file system for all users in the workspace. To clone a Git repository into this file share, we recommend that you create a compute instance & open a terminal. Once the terminal is opened, you have access to a full Git client and can clone and work with Git via the Git CLI experience.
We recommend that you clone the repository into your users directory so that others will not make collisions directly on your working branch.
You can clone any Git repository you can authenticate to (GitHub, Azure Repos, BitBucket, etc.)
For more information about cloning, see the guide on how to use Git CLI.
Authenticate your Git Account with SSH
Generate a new SSH key
Open the terminal window in the Azure Machine Learning Notebook Tab.
Paste the text below, substituting in your email address.
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
This creates a new ssh key, using the provided email as a label.
> Generating public/private rsa key pair.
When you're prompted to "Enter a file in which to save the key" press Enter. This accepts the default file location.
Verify that the default location is '/home/azureuser/.ssh' and press enter. Otherwise specify the location '/home/azureuser/.ssh'.
Tip
Make sure the SSH key is saved in '/home/azureuser/.ssh'. This file is saved on the compute instance is only accessible by the owner of the Compute Instance
> Enter a file in which to save the key (/home/azureuser/.ssh/id_rsa): [Press enter]
- At the prompt, type a secure passphrase. We recommend you add a passphrase to your SSH key for added security
> Enter passphrase (empty for no passphrase): [Type a passphrase]
> Enter same passphrase again: [Type passphrase again]
Add the public key to Git Account
- In your terminal window, copy the contents of your public key file. If you renamed the key, replace id_rsa.pub with the public key file name.
cat ~/.ssh/id_rsa.pub
Tip
Copy and Paste in Terminal
- Windows:
Ctrl-Insert
to copy and useCtrl-Shift-v
orShift-Insert
to paste. - Mac OS:
Cmd-c
to copy andCmd-v
to paste. - FireFox/IE may not support clipboard permissions properly.
- Select and copy the key output in the clipboard.
Azure DevOps Start at Step 2.
BitBucket. Start at Step 4.
Clone the Git repository with SSH
Copy the SSH Git clone URL from the Git repo.
Paste the url into the
git clone
command below, to use your SSH Git repo URL. This will look something like:
git clone git@example.com:GitUser/azureml-example.git
Cloning into 'azureml-example'...
You will see a response like:
The authenticity of host 'example.com (192.30.255.112)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'github.com,192.30.255.112' (RSA) to the list of known hosts.
SSH may display the server's SSH fingerprint and ask you to verify it. You should verify that the displayed fingerprint matches one of the fingerprints in the SSH public keys page.
SSH displays this fingerprint when it connects to an unknown host to protect you from man-in-the-middle attacks. Once you accept the host's fingerprint, SSH will not prompt you again unless the fingerprint changes.
- When you are asked if you want to continue connecting, type
yes
. Git will clone the repo and set up the origin remote to connect with SSH for future Git commands.
Track code that comes from Git repositories
When you submit a training run from the Python SDK or Machine Learning CLI, the files needed to train the model are uploaded to your workspace. If the git
command is available on your development environment, the upload process uses it to check if the files are stored in a git repository. If so, then information from your git repository is also uploaded as part of the training run. This information is stored in the following properties for the training run:
Property | Git command used to get the value | Description |
---|---|---|
azureml.git.repository_uri |
git ls-remote --get-url |
The URI that your repository was cloned from. |
mlflow.source.git.repoURL |
git ls-remote --get-url |
The URI that your repository was cloned from. |
azureml.git.branch |
git symbolic-ref --short HEAD |
The active branch when the run was submitted. |
mlflow.source.git.branch |
git symbolic-ref --short HEAD |
The active branch when the run was submitted. |
azureml.git.commit |
git rev-parse HEAD |
The commit hash of the code that was submitted for the run. |
mlflow.source.git.commit |
git rev-parse HEAD |
The commit hash of the code that was submitted for the run. |
azureml.git.dirty |
git status --porcelain . |
True , if the branch/commit is dirty; otherwise, false . |
This information is sent for runs that use an estimator, machine learning pipeline, or script run.
If your training files are not located in a git repository on your development environment, or the git
command is not available, then no git-related information is tracked.
Tip
To check if the git command is available on your development environment, open a shell session, command prompt, PowerShell or other command line interface and type the following command:
git --version
If installed, and in the path, you receive a response similar to git version 2.4.1
. For more information on installing git on your development environment, see the Git website.
View the logged information
The git information is stored in the properties for a training run. You can view this information using the Azure portal, Python SDK, and CLI.
Azure portal
- From the studio portal, select your workspace.
- Select Experiments, and then select one of your experiments.
- Select one of the runs from the RUN NUMBER column.
- Select Outputs + logs, and then expand the logs and azureml entries. Select the link that begins with ###_azure.
The logged information contains text similar to the following JSON:
"properties": {
"_azureml.ComputeTargetType": "batchai",
"ContentSnapshotId": "5ca66406-cbac-4d7d-bc95-f5a51dd3e57e",
"azureml.git.repository_uri": "git@github.com:azure/machinelearningnotebooks",
"mlflow.source.git.repoURL": "git@github.com:azure/machinelearningnotebooks",
"azureml.git.branch": "master",
"mlflow.source.git.branch": "master",
"azureml.git.commit": "4d2b93784676893f8e346d5f0b9fb894a9cf0742",
"mlflow.source.git.commit": "4d2b93784676893f8e346d5f0b9fb894a9cf0742",
"azureml.git.dirty": "True",
"AzureML.DerivedImageName": "azureml/azureml_9d3568242c6bfef9631879915768deaf",
"ProcessInfoFile": "azureml-logs/process_info.json",
"ProcessStatusFile": "azureml-logs/process_status.json"
}
Python SDK
After submitting a training run, a Run object is returned. The properties
attribute of this object contains the logged git information. For example, the following code retrieves the commit hash:
run.properties['azureml.git.commit']
CLI
The az ml run
CLI command can be used to retrieve the properties from a run. For example, the following command returns the properties for the last run in the experiment named train-on-amlcompute
:
az ml run list -e train-on-amlcompute --last 1 -w myworkspace -g myresourcegroup --query '[].properties'
For more information, see the az ml run reference documentation.