Deploying spark-nlp model using custom docker image fails in Azure Machine Learning

Question

Issue while deploying spark-nlp model to AML

I am trying to deploy a SPARK-NLP trained model from here to Azure Machine learning (using Python environment)

While deploying I am using a custom docker image provided by the spark-nlp documentation here
This is because when I try to use an existing image such as
env = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')
then we get errors when loading the model in the scoring script as spark-nlp libraries are not found in that AzureML-PySpark image.

So now I am using custom docker file as below:

dockerfile = r"""

FROM ubuntu:18.04

ENV NB_USER yuefeng
ENV NB_UID 1000
ENV HOME /home/${NB_USER}

ENV PYSPARK_PYTHON=python3
ENV PYSPARK_DRIVER_PYTHON=python3

RUN apt-get update && apt-get install -y \
tar \
wget \
bash \
rsync \
gcc \
libfreetype6-dev \
libhdf5-serial-dev \
libpng-dev \
libzmq3-dev \
python3 \
python3-dev \
python3-pip \
unzip \
pkg-config \
software-properties-common \
graphviz

RUN adduser --disabled-password \
--gecos "Default user" \
--uid ${NB_UID} \
${NB_USER}

RUN apt-get update && \
apt-get install -y openjdk-8-jdk && \
apt-get install -y ant && \
apt-get clean;

RUN apt-get update && \
apt-get install ca-certificates-java && \
apt-get clean && \
update-ca-certificates -f;

ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME

RUN echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/" >> ~/.bashrc

RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir notebook==5.* numpy pyspark==2.4.4 spark-nlp==2.5.1 azureml-sdk azureml-core pandas mlflow Keras scikit-spark scikit-learn scipy matplotlib pydot tensorflow graphviz

USER root
RUN chown -R ${NB_UID} ${HOME}
USER ${NB_USER}

WORKDIR ${HOME}

CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]
"""

My environment configuration is as below:

from azureml.core import Environment
from azureml.core.model import InferenceConfig

env=Environment("myenv")
env.docker.base_image=None
env.docker.base_dockerfile = dockerfile

env.inferencing_stack_version='latest'

inf_config = InferenceConfig(environment=env, entry_script="score.py")

the entry script, score.py looks like below:

%%writefile score.py
import json
import pyspark

import azureml.core
from azureml.core.model import Model
from azureml.core import Workspace
from pyspark.ml import PipelineModel
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
import sys, glob, os

global trainedModel
global spark
def init():
sys.path.extend(glob.glob(os.path.join(os.path.expanduser("~"), ".ivy2/jars/*.jar")))
spark = SparkSession.builder.appName("Spark NLP").master("local[4]").config("spark.driver.memory","16G").config("spark.driver.maxResultSize", "2G") .config("spark.kryoserializer.buffer.max", "2000M").config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.6.1,com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.15.1-beta").getOrCreate()
model_name = "nlp_test-register" #interpolated
model_path = Model.get_model_path(model_name)
trainedModel = PipelineModel.load(model_path)

def run(input_json):
if isinstance(trainedModel, Exception):
return json.dumps({{"trainedModel":str(trainedModel)}})

try:
    sc = spark.sparkContext
    input_list = json.loads(input_json)
    input_rdd = sc.parallelize(input_list)
    input_df = spark.read.json(input_rdd)


    prediction = trainedModel.transform(input_df)

    predictions = prediction.collect()


    preds = [str(x['ntokens']) for x in predictions.select('ntokens').collect()]
    result = ",".join(preds)

    return result
except Exception as e:
    result = str(e)
    return result

the spark-nlp trained model is registered successfully in the workspace.

from azureml.core.model import Model
--Register model
resgistered_Model = Model.register(ws, model_name="nlp_test-register", model_path="./test.mml")

Now, when trying to deploy the model as a local webservice:

deployment_config = LocalWebservice.deploy_configuration(port=6789) service = Model.deploy( ws, "myservice", [model], inf_config, deployment_config, overwrite=True, ) service.wait_for_deployment(show_output=True)

I get error when trying to build the environment:

Step 32/45 : RUN if dpkg --compare-versions conda --version | grep -oE '[^ ]+$' lt 4.4.11; then conda install conda==4.4.11; fi
---> Running in 1d06aaf8f181
/bin/sh: 1: conda: not found
dpkg: error: --compare-versions takes three arguments:

Type dpkg --help for help about installing and deinstalling packages [*];
Use 'apt' or 'aptitude' for user-friendly package management;
Type dpkg -Dhelp for a list of dpkg debug flag values;
Type dpkg --force-help for a list of forcing options;
Type dpkg-deb --help for help about manipulating *.deb files;
...
Step 33/45 : COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml
---> efe6235c07d2
Step 34/45 : RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name pycache -exec rm -rf {} + && ldconfig
---> Running in 5382859f6b89
/bin/sh: 1: conda: not found
The command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name pycache -exec rm -rf {} + && ldconfig' returned a non-zero code: 127
2021/06/17 02:47:36 Container failed during run: acb_step_0. No retries remaining.
failed to run step ID: acb_step_0: exit status 127

Run ID: ccx failed after 13m28s. Error: failed during run, err: exit status 1
Package creation Failed

Does it mean conda is not available in the docker image? How to install CONDA in the docker image? What commands should be given in the docker file?
error

Answer

@Priyanka Shah Thanks for the question. Yes, Instead of using a existing curated environment, you can try creating your own environment with the required package version dependencies specified in the requirements file/ conda yml configuration?
Please follow the below document for the same.
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-conda-dependencies-or-pip-requirements-files

Deploying spark-nlp model using custom docker image fails in Azure Machine Learning

Issue while deploying spark-nlp model to AML

My environment configuration is as below:

the entry script, score.py looks like below:

1 answer