Python package extensibility for prebuilt Docker images (preview)

Article
03/02/2023

The prebuilt Docker images for model inference contain packages for popular machine learning frameworks. There are two methods that can be used to add Python packages without rebuilding the Docker image:

Dynamic installation: This approach uses a requirements file to automatically restore Python packages when the Docker container boots.

Consider this method for rapid prototyping. When the image starts, packages are restored using the requirements.txt file. This method increases startup of the image, and you must wait longer before the deployment can handle requests.
Pre-installed Python packages: You provide a directory containing preinstalled Python packages. During deployment, this directory is mounted into the container for your entry script (score.py) to use.

Use this approach for production deployments. Since the directory containing the packages is mounted to the image, it can be used even when your deployments don't have public internet access. For example, when deployed into a secured Azure Virtual Network.

Important

Using Python package extensibility for prebuilt Docker images with Azure Machine Learning is currently in preview. Preview functionality is provided "as-is", with no guarantee of support or service level agreement. For more information, see the Supplemental terms of use for Microsoft Azure previews.

Prerequisites

An Azure Machine Learning workspace. For a tutorial on creating a workspace, see Get started with Azure Machine Learning.
Familiarity with using Azure Machine Learning environments.
Familiarity with Where and how to deploy models with Azure Machine Learning.

Dynamic installation

This approach uses a requirements file to automatically restore Python packages when the image starts up.

To extend your prebuilt docker container image through a requirements.txt, follow these steps:

Create a requirements.txt file alongside your score.py script.
Add all of your required packages to the requirements.txt file.
Set the AZUREML_EXTRA_REQUIREMENTS_TXT environment variable in your Azure Machine Learning environment to the location of requirements.txt file.

Once deployed, the packages will automatically be restored for your score script.

Tip

Even while prototyping, we recommend that you pin each package version in requirements.txt. For example, use scipy == 1.2.3 instead of just scipy or even scipy > 1.2.3. If you don't pin an exact version and scipy releases a new version, this can break your scoring script and cause failures during deployment and scaling.

The following example demonstrates setting the AZUREML_EXTRA_REQUIRMENTS_TXT environment variable:

from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies 

myenv = Environment(name="my_azureml_env")
myenv.docker.enabled = True
myenv.docker.base_image = <MCR-path>
myenv.python.user_managed_dependencies = True

myenv.environment_variables = {
    "AZUREML_EXTRA_REQUIREMENTS_TXT": "requirements.txt"
}

The following diagram is a visual representation of the dynamic installation process:

Diagram of dynamic installation process

Pre-installed Python packages

This approach mounts a directory that you provide into the image. The Python packages from this directory can then be used by the entry script (score.py).

To extend your prebuilt docker container image through pre-installed Python packages, follow these steps:

Important

You must use packages compatible with Python 3.7. All current images are pinned to Python 3.7.

Create a virtual environment using virtualenv.
Install your Dependencies. If you have a list of dependencies in a requirements.txt, for example, you can use that to install with pip install -r requirements.txt or just pip install individual dependencies.

When you specify the AZUREML_EXTRA_PYTHON_LIB_PATH environment variable, make sure that you point to the correct site packages directory, which will vary depending on your environment name and Python version. The following code demonstrates setting the path for a virtual environment named myenv and Python 3.7:

from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies 

myenv = Environment(name='my_azureml_env')
myenv.docker.enabled = True
myenv.docker.base_image = <MCR-path>
myenv.python.user_managed_dependencies = True

myenv.environment_variables = {
    "AZUREML_EXTRA_PYTHON_LIB_PATH": "myenv/lib/python3.7/site-packages"
}

The following diagram is a visual representation of the pre-installed packages process:

Diagram of the process using preinstalled packages

Common problems

The mounting solution will only work when your myenv site packages directory contains all of your dependencies. If your local environment is using dependencies installed in a different location, they won't be available in the image.

Here are some things that may cause this problem:

virtualenv creates an isolated environment by default. Once you activate the virtual environment, global dependencies cannot be used.
If you have a PYTHONPATH environment variable pointing to your global dependencies, it may interfere with your virtual environment. Run pip list and pip freeze after activating your environment to make sure no unwanted dependencies are in your environment.
Conda and virtualenv environments can interfere. Make sure that not to use Conda environment and virtualenv at the same time.

Limitations

Model.package()

The Model.package() method lets you create a model package in the form of a Docker image or Dockerfile build context. Using Model.package() with prebuilt inference docker images triggers an intermediate image build that changes the non-root user to root user.
We encourage you to use our Python package extensibility solutions. If other dependencies are required (such as apt packages), create your own Dockerfile extending from the inference image.

Frequently asked questions

In the requirements.txt extensibility approach is it mandatory for the file name to be requirements.txt?

myenv.environment_variables = {
    "AZUREML_EXTRA_REQUIREMENTS_TXT": "name of your pip requirements file goes here"
}

Can you summarize the requirements.txt approach versus the mounting approach?

Start prototyping with the requirements.txt approach. After some iteration, when you're confident about which packages (and versions) you need for a successful model deployment, switch to the Mounting Solution.

Here's a detailed comparison.

Compared item	Requirements.txt (dynamic installation)	Package Mount
Solution	Create a `requirements.txt` that installs the specified packages when the container starts.	Create a local Python environment with all of the dependencies. Mount this directory into container at runtime.
Package Installation	No extra installation (assuming pip already installed)	Virtual environment or conda environment installation.
Virtual environment Setup	No extra setup of virtual environment required, as users can pull the current local user environment with pip freeze as needed to create the `requirements.txt`.	Need to set up a clean virtual environment, may take extra steps depending on the current user local environment.
Debugging	Easy to set up and debug server, since dependencies are clearly listed.	Unclean virtual environment could cause problems when debugging of server. For example, it may not be clear if errors come from the environment or user code.
Consistency during scaling out	Not consistent as dependent on external PyPi packages and users pinning their dependencies. These external downloads could be flaky.	Relies solely on user environment, so no consistency issues.

Why are my requirements.txt and mounted dependencies directory not found in the container?

Locally, verify the environment variables are set properly. Next, verify the paths that are specified are spelled properly and exist. Check if you have set your source directory correctly in the inference config constructor.
Can I override Python package dependencies in prebuilt inference docker image?

Yes. If you want to use other version of Python package that is already installed in an inference image, our extensibility solution will respect your version. Make sure there are no conflicts between the two versions.

Best Practices

Refer to the Load registered model docs. When you register a model directory, don't include your scoring script, your mounted dependencies directory, or requirements.txt within that directory.
For more information on how to load a registered or local model, see Where and how to deploy.

Bug Fixes

2021-07-26

AZUREML_EXTRA_REQUIREMENTS_TXT and AZUREML_EXTRA_PYTHON_LIB_PATH are now always relative to the directory of the score script. For example, if both the requirements.txt and score script is in my_folder, then AZUREML_EXTRA_REQUIREMENTS_TXT will need to be set to requirements.txt. No longer will AZUREML_EXTRA_REQUIREMENTS_TXT be set to my_folder/requirements.txt.

Next steps

To learn more about deploying a model, see How to deploy a model.

To learn how to troubleshoot prebuilt docker image deployments, see how to troubleshoot prebuilt Docker image deployments.