question

AndrewBlance-8766 avatar image
0 Votes"
AndrewBlance-8766 asked ramr-msft commented

Getting 500 errors after model deployment

Hello,

I am trying to deploy a model using InferenceConfig . It deploys successfully, both locally and to an ACI, but whenever I make a request to it I get a <Response [500]> error. My code is based on what is found in this example here. I believe the init and run parts of my entry script are correct, as when I call the run function without it being involved in a deployment the data does go through it correctly. Investigating the endpoint on the Azure portal makes it look ok to, it has a "healthy" status.

Is there any advice about how to fix this? Or how I can modify what I have to get around it? I have attached code below

These are my deployment logs:

 2021-09-30T08:55:33,882785000+00:00 - iot-server/run 
 2021-09-30T08:55:33,893848100+00:00 - gunicorn/run 
 Dynamic Python package installation is disabled.
 Starting HTTP server
 2021-09-30T08:55:33,903826800+00:00 - rsyslog/run 
 2021-09-30T08:55:33,932288900+00:00 - nginx/run 
 EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
 2021-09-30T08:55:34,629656500+00:00 - iot-server/finish 1 0
 2021-09-30T08:55:34,631347500+00:00 - Exit code 1 is normal. Not restarting iot-server.
 Starting gunicorn 20.1.0
 Listening at: http://127.0.0.1:31311 (63)
 Using worker: sync
 worker timeout is set to 300
 Booting worker with pid: 87
 SPARK_HOME not set. Skipping PySpark Initialization.
 Initializing logger
 2021-09-30 08:55:36,257 | root | INFO | Starting up app insights client
 logging socket was found. logging is available.
 logging socket was found. logging is available.
 2021-09-30 08:55:36,258 | root | INFO | Starting up request id generator
 2021-09-30 08:55:36,260 | root | INFO | Starting up app insight hooks
 2021-09-30 08:55:36,260 | root | INFO | Invoking user's init function
 2021-09-30 08:55:36,423 | root | INFO | Users's init has completed successfully
 2021-09-30 08:55:36,430 | root | INFO | Skipping middleware: dbg_model_info as it's not enabled.
 2021-09-30 08:55:36,430 | root | INFO | Skipping middleware: dbg_resource_usage as it's not enabled.
 2021-09-30 08:55:36,432 | root | INFO | Scoring timeout is found from os.environ: 60000 ms
 2021-09-30 08:56:19,493 | root | INFO | Swagger file not present
 2021-09-30 08:56:19,493 | root | INFO | 404
 127.0.0.1 - - [30/Sep/2021:08:56:19 +0000] "GET /swagger.json HTTP/1.0" 404 19 "-" "Go-http-client/1.1"
 2021-09-30 08:56:19,594 | root | INFO | Swagger file not present
 2021-09-30 08:56:19,595 | root | INFO | 404
 127.0.0.1 - - [30/Sep/2021:08:56:19 +0000] "GET /swagger.json HTTP/1.0" 404 19 "-" "Go-http-client/1.1"
 2021-09-30 08:56:23,093 | root | INFO | Swagger file not present
 2021-09-30 08:56:23,093 | root | INFO | 404
 127.0.0.1 - - [30/Sep/2021:08:56:23 +0000] "GET /swagger.json HTTP/1.0" 404 19 "-" "Go-http-client/1.1"
 2021-09-30 08:56:24,193 | root | INFO | Swagger file not present
 2021-09-30 08:56:24,194 | root | INFO | 404
 127.0.0.1 - - [30/Sep/2021:08:56:24 +0000] "GET /swagger.json HTTP/1.0" 404 19 "-" "Go-http-client/1.1"
 2021-09-30 08:56:47,726 | root | INFO | Scoring Timer is set to 60.0 seconds
 2021-09-30 08:56:47,727 | root | ERROR | Encountered Exception: Traceback (most recent call last):
   File "/var/azureml-server/synchronous/routes.py", line 65, in run_scoring
     response, time_taken_ms = invoke_user_with_timer(service_input, request_headers)
   File "/var/azureml-server/synchronous/routes.py", line 110, in invoke_user_with_timer
     result, time_taken_ms = capture_time_taken(user_main.run)(**params)
   File "/var/azureml-server/synchronous/routes.py", line 92, in timer
     result = func(*args, **kwargs)
 TypeError: run() got an unexpected keyword argument 'input'
    
 During handling of the above exception, another exception occurred:
    
 Traceback (most recent call last):
   File "/azureml-envs/azureml_be9a4db270db0ae2ca6059a059402ecf/lib/python3.6/site-packages/flask/app.py", line 1832, in full_dispatch_request
     rv = self.dispatch_request()
   File "/azureml-envs/azureml_be9a4db270db0ae2ca6059a059402ecf/lib/python3.6/site-packages/flask/app.py", line 1818, in dispatch_request
     return self.view_functions[rule.endpoint](**req.view_args)
   File "/var/azureml-server/synchronous/routes.py", line 44, in score_realtime
     return run_scoring(service_input, request.headers, request.environ.get('REQUEST_ID', '00000000-0000-0000-0000-000000000000'))
   File "/var/azureml-server/synchronous/routes.py", line 74, in run_scoring
     raise RunFunctionException(str(exc))
 run_function_exception.RunFunctionException
    
 2021-09-30 08:56:47,728 | root | INFO | 500
 127.0.0.1 - - [30/Sep/2021:08:56:47 +0000] "POST /score HTTP/1.0" 500 48 "-" "python-requests/2.25.1"
 2021-09-30 08:57:25,014 | root | INFO | Swagger file not present
 2021-09-30 08:57:25,014 | root | INFO | 404
 127.0.0.1 - - [30/Sep/2021:08:57:25 +0000] "GET /swagger.json HTTP/1.0" 404 19 "-" "Go-http-client/1.1"
 Exception in worker process
 Traceback (most recent call last):
   File "/azureml-envs/azureml_be9a4db270db0ae2ca6059a059402ecf/lib/python3.6/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
     worker.init_process()
   File "/azureml-envs/azureml_be9a4db270db0ae2ca6059a059402ecf/lib/python3.6/site-packages/gunicorn/workers/base.py", line 142, in init_process
     self.run()
   File "/azureml-envs/azureml_be9a4db270db0ae2ca6059a059402ecf/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 125, in run
     self.run_for_one(timeout)
   File "/azureml-envs/azureml_be9a4db270db0ae2ca6059a059402ecf/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 84, in run_for_one
     self.wait(timeout)
   File "/azureml-envs/azureml_be9a4db270db0ae2ca6059a059402ecf/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 36, in wait
     ret = select.select(self.wait_fds, [], [], timeout)
   File "/var/azureml-server/routes_common.py", line 153, in alarm_handler
     raise TimeoutException(error_message)
 timeout_exception.TimeoutException
 Worker exiting (pid: 87)
 worker timeout is set to 300
 Booting worker with pid: 145
 SPARK_HOME not set. Skipping PySpark Initialization.
 Initializing logger
 2021-09-30 08:57:48,727 | root | INFO | Starting up app insights client
 logging socket was found. logging is available.
 logging socket was found. logging is available.
 2021-09-30 08:57:48,732 | root | INFO | Starting up request id generator
 2021-09-30 08:57:48,732 | root | INFO | Starting up app insight hooks
 2021-09-30 08:57:48,733 | root | INFO | Invoking user's init function
 2021-09-30 08:57:48,827 | root | INFO | Users's init has completed successfully
 2021-09-30 08:57:48,833 | root | INFO | Skipping middleware: dbg_model_info as it's not enabled.
 2021-09-30 08:57:48,834 | root | INFO | Skipping middleware: dbg_resource_usage as it's not enabled.
 2021-09-30 08:57:48,835 | root | INFO | Scoring timeout is found from os.environ: 60000 ms
 2021-09-30 08:58:10,161 | root | INFO | Swagger file not present
 2021-09-30 08:58:10,162 | root | INFO | 404
 127.0.0.1 - - [30/Sep/2021:08:58:10 +0000] "GET /swagger.json HTTP/1.0" 404 19 "-" "Go-http-client/1.1"

If I run the command azmlinfsrv --model_dir . --entry_script entry_script.py to see how the entry script works locally, I get the output:

 Azure ML Inferencing HTTP server v0.4.1
    
    
 Server Settings
 ---------------
 Entry Script Name: entry_script.py
 Model Directory: ./
 Worker Count: 1
 Server Port: 5001
 Application Insights Enabled: false
 Application Insights Key: None
    
    
 Server Routes
 ---------------
 Liveness Probe: GET   127.0.0.1:5001/
 Score:          POST  127.0.0.1:5001/score
    
 Starting gunicorn 20.1.0
 Connection in use: ('0.0.0.0', 5001)
 Retrying in 1 second.
 Connection in use: ('0.0.0.0', 5001)
 Retrying in 1 second.
 Connection in use: ('0.0.0.0', 5001)

entry_script.py:

 import json
 import numpy as np
 import os
 import onnxruntime
    
 # Called when the service is loaded
 def init():
     # Get the path to the deployed model file and load it
     global sess
     sess = onnxruntime.InferenceSession(
         os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model.onnx")
     )
    
 def run(raw_data, session = None):
     if session != None: sess = session
     try:
         # Get the input data as a numpy array
         data = np.array(json.loads(raw_data)['data'], dtype=np.float32)
         # Get a prediction from the model
    
         first_input_name = sess.get_inputs()[0].name
         first_output_name = sess.get_outputs()[0].name
    
         test = sess.run(
             [first_output_name], {first_input_name: data}
         )
         result = test[0].tolist()
    
         # Return the predictions as JSON
         return json.dumps({"result":result})
     except Exception as e:
         result = str(e)
         return {"error": result}

Deploy Code:

 service_env = Environment(name='service-env')
 python_packages = ['numpy', 'onnxruntime']
 for package in python_packages:
     service_env.python.conda_dependencies.add_pip_package(package)
 inference_config = InferenceConfig(source_directory="./source_dir",
                                    entry_script="./entry_script.py",
                                    environment=service_env)
    
 deployment_config = AciWebservice.deploy_configuration(
     cpu_cores=0.5, memory_gb=1, auth_enabled=True
 )
    
 service = Model.deploy(
     ws,
     "myservice",
     [model],
     inference_config,
     deployment_config,
     overwrite=True,
 )
 service.wait_for_deployment(show_output=True)





azure-machine-learningazure-machine-learning-inference
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered ramr-msft commented

@AndrewBlance-8766 Thanks for the question. Can you please add more about the error details that you are getting.

Here is link to document for Troubleshooting remote model deployment.


· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the link! I've tried out a few of the things there and have posted the results

0 Votes 0 ·
ramr-msft avatar image ramr-msft AndrewBlance-8766 ·

@AndrewBlance-8766 Thanks for the details. Please share details of your experiment and issue from the ml.azure.com portal for a service engineer to lookup the issue from the back-end? This option is available from the top right hand corner of the portal by clicking the smiley face, Please select the option Microsoft can email you about the feedback along with a screen shot so our service team can lookup and advise through email.

0 Votes 0 ·