AksWebservice Class

Represents a machine learning model deployed as a web service endpoint on Azure Kubernetes Service.

A deployed service is created from a model, script, and associated files. The resulting web service is a load-balanced, HTTP endpoint with a REST API. You can send data to this API and receive the prediction returned by the model.

AksWebservice deploys a single service to one endpoint. To deploy multiple services to one endpoint, use the AksEndpoint class.

For more information, see Deploy a model to an Azure Kubernetes Service cluster.

Inheritance
AksWebservice

Constructor

AksWebservice(workspace, name)

Remarks

The recommended deployment pattern is to create a deployment configuration object with the deploy_configuration method and then use it with the deploy method of the Model class as shown below.


   # Set the web service configuration (using default here)
   aks_config = AksWebservice.deploy_configuration()

   # # Enable token auth and disable (key) auth on the webservice
   # aks_config = AksWebservice.deploy_configuration(token_auth_enabled=True, auth_enabled=False)

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb

There are a number of ways to deploy a model as a webservice, including with the:

  • deploy method of the Model for models already registered in the workspace.

  • deploy_from_image method of Webservice.

  • deploy_from_model method of Webservice for models already registered in the workspace. This method will create an image.

  • deploy method of the Webservice, which will register a model and create an image.

For information on working with webservices, see

The Variables section lists attributes of a local representation of the cloud AksWebservice object. These variables should be considered read-only. Changing their values will not be reflected in the corresponding cloud object.

Variables

enable_app_insights
bool

Whether or not AppInsights logging is enabled for the Webservice.

autoscaler
AutoScaler

The Autoscaler object for the Webservice.

compute_name
str

The name of the ComputeTarget that the Webservice is deployed to.

container_resource_requirements
ContainerResourceRequirements

The container resource requirements for the Webservice.

liveness_probe_requirements
LivenessProbeRequirements

The liveness probe requirements for the Webservice.

data_collection
DataCollection

The DataCollection object for the Webservice.

max_concurrent_requests_per_container
int

The maximum number of concurrent requests per container for the Webservice.

max_request_wait_time
int

The maximum request wait time for the Webservice, in milliseconds.

num_replicas
int

The number of replicas for the Webservice. Each replica corresponds to an AKS pod.

scoring_timeout_ms
int

The scoring timeout for the Webservice, in milliseconds.

azureml.core.webservice.AksWebservice.scoring_uri
str

The scoring endpoint for the Webservice

is_default
bool

If the Webservice is the default version for the parent AksEndpoint.

traffic_percentile
int

What percentage of traffic to route to the Webservice in the parent AksEndpoint.

version_type
VersionType

The version type for the Webservice in the parent AksEndpoint.

token_auth_enabled
bool

Whether or not token auth is enabled for the Webservice.

environment
Environment

The Environment object that was used to create the Webservice.

azureml.core.webservice.AksWebservice.models
list[Model]

A list of Models deployed to the Webservice.

deployment_status
str

The deployment status of the Webservice.

namespace
str

The AKS namespace of the Webservice.

azureml.core.webservice.AksWebservice.swagger_uri
str

The swagger endpoint for the Webservice.

Methods

add_properties

Add key value pairs to this Webservice's properties dictionary.

add_tags

Add key value pairs to this Webservice's tags dictionary.

Raises a WebserviceException.

deploy_configuration

Create a configuration object for deploying to an AKS compute target.

get_access_token

Retrieve auth token for this Webservice.

get_token

DEPRECATED. Use get_access_token method instead.

Retrieve auth token for this Webservice.

remove_tags

Remove the specified keys from this Webservice's dictionary of tags.

run

Call this Webservice with the provided input.

serialize

Convert this Webservice into a JSON serialized dictionary.

update

Update the Webservice with provided properties.

Values left as None will remain unchanged in this Webservice.

add_properties

Add key value pairs to this Webservice's properties dictionary.

add_properties(properties)

Parameters

properties
dict[str, str]

The dictionary of properties to add.

add_tags

Add key value pairs to this Webservice's tags dictionary.

Raises a WebserviceException.

add_tags(tags)

Parameters

tags
dict[str, str]

The dictionary of tags to add.

Exceptions

deploy_configuration

Create a configuration object for deploying to an AKS compute target.

deploy_configuration(autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, auth_enabled=None, cpu_cores=None, memory_gb=None, enable_app_insights=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, primary_key=None, secondary_key=None, tags=None, properties=None, description=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, namespace=None, token_auth_enabled=None, compute_target_name=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

autoscale_enabled
bool
default value: None

Whether or not to enable autoscaling for this Webservice. Defaults to True if num_replicas is None.

autoscale_min_replicas
int
default value: None

The minimum number of containers to use when autoscaling this Webservice. Defaults to 1.

autoscale_max_replicas
int
default value: None

The maximum number of containers to use when autoscaling this Webservice. Defaults to 10.

autoscale_refresh_seconds
int
default value: None

How often the autoscaler should attempt to scale this Webservice. Defaults to 1.

autoscale_target_utilization
int
default value: None

The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this Webservice. Defaults to 70.

collect_model_data
bool
default value: None

Whether or not to enable model data collection for this Webservice. Defaults to False.

auth_enabled
bool
default value: None

Whether or not to enable key auth for this Webservice. Defaults to True.

cpu_cores
float
default value: None

The number of cpu cores to allocate for this Webservice. Can be a decimal. Defaults to 0.1. Corresponds to the pod core request, not the limit, in Azure Kubernetes Service.

memory_gb
float
default value: None

The amount of memory (in GB) to allocate for this Webservice. Can be a decimal. Defaults to 0.5. Corresponds to the pod memory request, not the limit, in Azure Kubernetes Service.

enable_app_insights
bool
default value: None

Whether or not to enable Application Insights logging for this Webservice. Defaults to False.

scoring_timeout_ms
int
default value: None

A timeout to enforce for scoring calls to this Webservice. Defaults to 60000.

replica_max_concurrent_requests
int
default value: None

The number of maximum concurrent requests per replica to allow for this Webservice. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.

max_request_wait_time
int
default value: None

The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.

num_replicas
int
default value: None

The number of containers to allocate for this Webservice. No default, if this parameter is not set then the autoscaler is enabled by default.

primary_key
str
default value: None

A primary auth key to use for this Webservice.

secondary_key
str
default value: None

A secondary auth key to use for this Webservice.

tags
dict[str, str]
default value: None

Dictionary of key value tags to give this Webservice.

properties
dict[str, str]
default value: None

Dictionary of key value properties to give this Webservice. These properties cannot be changed after deployment, however new key value pairs can be added.

description
str
default value: None

A description to give this Webservice.

gpu_cores
int
default value: None

The number of GPU cores to allocate for this Webservice. Defaults to 0.

period_seconds
int
default value: None

How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.

initial_delay_seconds
int
default value: None

The number of seconds after the container has started before liveness probes are initiated. Defaults to 310.

timeout_seconds
int
default value: None

The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.

success_threshold
int
default value: None

The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.

failure_threshold
int
default value: None

When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.

namespace
str
default value: None

The Kubernetes namespace in which to deploy this Webservice: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first and last characters cannot be hyphens.

token_auth_enabled
bool
default value: None

Whether or not to enable Token auth for this Webservice. If this is enabled, users can access this Webservice by fetching an access token using their Azure Active Directory credentials. Defaults to False.

compute_target_name
str
default value: None

The name of the compute target to deploy to

cpu_cores_limit
float
default value: None

The max number of cpu cores this Webservice is allowed to use. Can be a decimal.

memory_gb_limit
float
default value: None

The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.

Returns

A configuration object to use when deploying a AksWebservice.

Return type

Exceptions

get_access_token

Retrieve auth token for this Webservice.

get_access_token()

Returns

An object describing the auth token for this Webservice.

Return type

get_token

DEPRECATED. Use get_access_token method instead.

Retrieve auth token for this Webservice.

get_token()

Returns

The auth token for this Webservice and when to refresh it.

Return type

remove_tags

Remove the specified keys from this Webservice's dictionary of tags.

remove_tags(tags)

Parameters

tags
list[str]

The list of keys to remove

run

Call this Webservice with the provided input.

run(input_data)

Parameters

input_data
<xref:varies>

The input to call the Webservice with

Returns

The result of calling the Webservice

Return type

Exceptions

serialize

Convert this Webservice into a JSON serialized dictionary.

serialize()

Returns

The JSON representation of this Webservice.

Return type

update

Update the Webservice with provided properties.

Values left as None will remain unchanged in this Webservice.

update(image=None, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, auth_enabled=None, cpu_cores=None, memory_gb=None, enable_app_insights=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, namespace=None, token_auth_enabled=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

image
Image
default value: None

A new Image to deploy to the Webservice

autoscale_enabled
bool
default value: None

Enable or disable autoscaling of this Webservice

autoscale_min_replicas
int
default value: None

The minimum number of containers to use when autoscaling this Webservice

autoscale_max_replicas
int
default value: None

The maximum number of containers to use when autoscaling this Webservice

autoscale_refresh_seconds
int
default value: None

How often the autoscaler should attempt to scale this Webservice

autoscale_target_utilization
int
default value: None

The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this Webservice

collect_model_data
bool
default value: None

Enable or disable model data collection for this Webservice

auth_enabled
bool
default value: None

Whether or not to enable auth for this Webservice

cpu_cores
float
default value: None

The number of cpu cores to allocate for this Webservice. Can be a decimal

memory_gb
float
default value: None

The amount of memory (in GB) to allocate for this Webservice. Can be a decimal

enable_app_insights
bool
default value: None

Whether or not to enable Application Insights logging for this Webservice

scoring_timeout_ms
int
default value: None

A timeout to enforce for scoring calls to this Webservice

replica_max_concurrent_requests
int
default value: None

The number of maximum concurrent requests per replica to allow for this Webservice.

max_request_wait_time
int
default value: None

The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error

num_replicas
int
default value: None

The number of containers to allocate for this Webservice

tags
dict[str, str]
default value: None

Dictionary of key value tags to give this Webservice. Will replace existing tags.

properties
dict[str, str]
default value: None

Dictionary of key value properties to add to existing properties dictionary

description
str
default value: None

A description to give this Webservice

models
list[Model]
default value: None

A list of Model objects to package with the updated service

inference_config
InferenceConfig
default value: None

An InferenceConfig object used to provide the required model deployment properties.

gpu_cores
int
default value: None

The number of gpu cores to allocate for this Webservice

period_seconds
int
default value: None

How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.

initial_delay_seconds
int
default value: None

Number of seconds after the container has started before liveness probes are initiated.

timeout_seconds
int
default value: None

Number of seconds after which the liveness probe times out. Defaults to 1 second. Minimum value is 1.

success_threshold
int
default value: None

Minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.

failure_threshold
int
default value: None

When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.

namespace
str
default value: None

The Kubernetes namespace in which to deploy this Webservice: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first and last characters cannot be hyphens.

token_auth_enabled
bool
default value: None

Whether or not to enable Token auth for this Webservice. If this is enabled, users can access this Webservice by fetching access token using their Azure Active Directory credentials. Defaults to False

cpu_cores_limit
float
default value: None

The max number of cpu cores this Webservice is allowed to use. Can be a decimal.

memory_gb_limit
float
default value: None

The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.

Exceptions