AksEndpoint Class

Reference

Note

This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Represents a collection of web service versions behind the same endpoint running on Azure Kubernetes Service.

Whereas a AksWebservice deploys a single service with a single scoring endpoint, the AksEndpoint class enables you to deploy multiple web service versions behind the same scoring endpoint. Each web service version can be configured to serve a percentage of the traffic so you can deploy models in a controlled fashion, for example, for A/B testing. The AksEndpoint allows deployment from a model object similar to AksWebservice.

Initialize the Webservice instance.

The Webservice constructor retrieves a cloud representation of a Webservice object associated with the provided workspace. It will return an instance of a child class corresponding to the specific type of the retrieved Webservice object.

Inheritance: AksWebservice

AksEndpoint

Constructor

AksEndpoint(workspace, name)

Parameters

Name	Description
workspace Required	Workspace The workspace object containing the Webservice object to retrieve.
name Required	str The name of the of the Webservice object to retrieve.

Variables

Name	Description
versions	dict[str, AksWebservice] A dictionary of version name to version object. Contains all of the versions deployed as a part of this Endpoint.

Methods

create_version	Add a new version in an Endpoint with provided properties.
delete_version	Delete a version in an Endpoint.
deploy_configuration	Create a configuration object for deploying to an AKS compute target.
serialize	Convert this Webservice into a JSON serialized dictionary.
update	Update the Endpoint with provided properties. Values left as None will remain unchanged in this Endpoint
update_version	Update an existing version in an Endpoint with provided properties. Values left as None will remain unchanged in this version.

create_version

Add a new version in an Endpoint with provided properties.

create_version(version_name, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, cpu_cores=None, memory_gb=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, traffic_percentile=None, is_default=None, is_control_version_type=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

Name	Description
version_name Required	str The name of the version to add in an endpoint.
autoscale_enabled	bool Whether or not to enable autoscaling for this version in an Endpoint. Defaults to True if `num_replicas` is None. default value: None
autoscale_min_replicas	int The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1 default value: None
autoscale_max_replicas	int The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10 default value: None
autoscale_refresh_seconds	int How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1 default value: None
autoscale_target_utilization	int The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70 default value: None
collect_model_data	bool Whether or not to enable model data collection for this version in an Endpoint. Defaults to False default value: None
cpu_cores	float The number of CPU cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1 default value: None
memory_gb	float The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5 default value: None
scoring_timeout_ms	int A timeout to enforce for scoring calls to this version in an Endpoint. Defaults to 60000. default value: None
replica_max_concurrent_requests	int The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team. default value: None
max_request_wait_time	int The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500. default value: None
num_replicas	int The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default. default value: None
tags	dict[str, str] Dictionary of key value tags to give this Endpoint. default value: None
properties	dict[str, str] Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added. default value: None
description	str A description to give this Endpoint. default value: None
models	list[Model] A list of Model objects to package with the updated service. default value: None
inference_config	InferenceConfig An InferenceConfig object used to provide the required model deployment properties. default value: None
gpu_cores	int The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0. default value: None
period_seconds	int How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1. default value: None
initial_delay_seconds	int The number of seconds after the container has started before liveness probes are initiated. Defaults to 310. default value: None
timeout_seconds	int The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1. default value: None
success_threshold	int The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1. default value: None
failure_threshold	int When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1. default value: None
traffic_percentile	float The amount of traffic the version takes in an endpoint. default value: None
is_default	bool Whether or not to make this version as default version in an Endpoint. Defaults to False. default value: None
is_control_version_type	bool Whether or not to make this version as control version in an Endpoint. Defaults to False. default value: None
cpu_cores_limit	float The max number of cpu cores this Webservice is allowed to use. Can be a decimal. default value: None
memory_gb_limit	float The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal. default value: None

Exceptions

Type	Description
WebserviceException

delete_version

Delete a version in an Endpoint.

delete_version(version_name)

Parameters

Name	Description
version_name Required	str The name of the version in an endpoint to delete.

Exceptions

Type	Description
WebserviceException

deploy_configuration

Create a configuration object for deploying to an AKS compute target.

static deploy_configuration(autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, auth_enabled=None, cpu_cores=None, memory_gb=None, enable_app_insights=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, primary_key=None, secondary_key=None, tags=None, properties=None, description=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, namespace=None, token_auth_enabled=None, version_name=None, traffic_percentile=None, compute_target_name=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

Name	Description
autoscale_enabled	bool Whether or not to enable autoscaling for this version in an Endpoint. Defaults to True if `num_replicas` is None. default value: None
autoscale_min_replicas	int The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1. default value: None
autoscale_max_replicas	int The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10. default value: None
autoscale_refresh_seconds	int How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1. default value: None
autoscale_target_utilization	int The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70. default value: None
collect_model_data	bool Whether or not to enable model data collection for this version in an Endpoint. Defaults to False. default value: None
auth_enabled	bool Whether or not to enable key auth for this version in an Endpoint. Defaults to True. default value: None
cpu_cores	float The number of cpu cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1 default value: None
memory_gb	float The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5 default value: None
enable_app_insights	bool Whether or not to enable ApplicationInsights logging for this version in an Endpoint. Defaults to False. default value: None
scoring_timeout_ms	int A timeout to enforce scoring calls to this version in an Endpoint. Defaults to 60000 default value: None
replica_max_concurrent_requests	int The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team. default value: None
max_request_wait_time	int The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500. default value: None
num_replicas	int The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default. default value: None
primary_key	str A primary auth key to use for this Endpoint. default value: None
secondary_key	str A secondary auth key to use for this Endpoint. default value: None
tags	dict[str, str] Dictionary of key value tags to give this Endpoint. default value: None
properties	dict[str, str] Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added default value: None
description	str A description to give this Endpoint. default value: None
gpu_cores	int The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0. default value: None
period_seconds	int How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1. default value: None
initial_delay_seconds	int Number of seconds after the container has started before liveness probes are initiated. Defaults to 310. default value: None
timeout_seconds	int Number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1. default value: None
success_threshold	int Minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1. default value: None
failure_threshold	int When a Pod starts and the liveness probe fails, Kubernetes will try `failureThreshold` times before giving up. Defaults to 3. Minimum value is 1. default value: None
namespace	str The Kubernetes namespace in which to deploy this Endpoint: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first and last characters cannot be hyphens. default value: None
token_auth_enabled	bool Whether or not to enable Token auth for this Endpoint. If this is enabled, users can access this Endpoint by fetching access token using their Azure Active Directory credentials. Defaults to False. default value: None
version_name	str The name of the version in an endpoint. default value: None
traffic_percentile	float the amount of traffic the version takes in an endpoint. default value: None
compute_target_name	str The name of the compute target to deploy to default value: None
cpu_cores_limit	float The max number of cpu cores this Webservice is allowed to use. Can be a decimal. default value: None
memory_gb_limit	float The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal. default value: None

Returns

Type	Description
AksEndpointDeploymentConfiguration

Exceptions

Type	Description
WebserviceException

serialize

Convert this Webservice into a JSON serialized dictionary.

serialize()

Returns

Type	Description
dict	The JSON representation of this Webservice.

Exceptions

Type	Description
WebserviceException

update

Update the Endpoint with provided properties.

Values left as None will remain unchanged in this Endpoint

update(auth_enabled=None, token_auth_enabled=None, enable_app_insights=None, description=None, tags=None, properties=None)

Parameters

Name	Description
auth_enabled	bool Whether or not to enable key auth for this version in an Endpoint. Defaults to True. default value: None
token_auth_enabled	bool Whether or not to enable Token auth for this Endpoint. If this is enabled, users can access this Endpoint by fetching access token using their Azure Active Directory credentials. Defaults to False. default value: None
enable_app_insights	bool Whether or not to enable Application Insights logging for this version in an Endpoint. Defaults to False. default value: None
description	str A description to give this Endpoint. default value: None
tags	dict[str, str] Dictionary of key value tags to give this Endpoint. default value: None
properties	dict[str, str] Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added. default value: None

Exceptions

Type	Description
WebserviceException

update_version

Update an existing version in an Endpoint with provided properties.

Values left as None will remain unchanged in this version.

update_version(version_name, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, cpu_cores=None, memory_gb=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, traffic_percentile=None, is_default=None, is_control_version_type=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

Name	Description
version_name Required	str The name of the version in an endpoint.
autoscale_enabled	bool Whether or not to enable autoscaling for this version in an Endpoint. Defaults to True if num_replicas is None. default value: None
autoscale_min_replicas	int The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1. default value: None
autoscale_max_replicas	int The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10. default value: None
autoscale_refresh_seconds	int How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1 default value: None
autoscale_target_utilization	int The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70. default value: None
collect_model_data	bool Whether or not to enable model data collection for this version in an Endpoint. Defaults to False. default value: None
cpu_cores	float The number of cpu cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1 default value: None
memory_gb	float The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5 default value: None
scoring_timeout_ms	int A timeout to enforce for scoring calls to this version in an Endpoint. Defaults to 60000. default value: None
replica_max_concurrent_requests	int The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team. default value: None
max_request_wait_time	int The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500. default value: None
num_replicas	int The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default. default value: None
tags	dict[str, str] Dictionary of key value tags to give this Endpoint. default value: None
properties	dict[str, str] Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added. default value: None
description	str A description to give this Endpoint default value: None
models	list[Model] A list of Model objects to package with the updated service default value: None
inference_config	InferenceConfig An InferenceConfig object used to provide the required model deployment properties. default value: None
gpu_cores	int The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0. default value: None
period_seconds	int How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1. default value: None
initial_delay_seconds	int The number of seconds after the container has started before liveness probes are initiated. Defaults to 310. default value: None
timeout_seconds	int The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1. default value: None
success_threshold	int The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1. default value: None
failure_threshold	int When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1. default value: None
traffic_percentile	float The amount of traffic the version takes in an endpoint. default value: None
is_default	bool Whether or not to make this version as default version in an Endpoint. Defaults to False. default value: None
is_control_version_type	bool Whether or not to make this version as control version in an Endpoint. Defaults to False. default value: None
cpu_cores_limit	float The max number of cpu cores this Webservice is allowed to use. Can be a decimal. default value: None
memory_gb_limit	float The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal. default value: None

Exceptions

Type	Description
WebserviceException

AksEndpoint Class

Constructor

Parameters

Variables

Methods

create_version

Parameters

Exceptions

delete_version

Parameters

Exceptions

deploy_configuration

Parameters

Returns

Exceptions

serialize

Returns

Exceptions

update

Parameters

Exceptions

update_version

Parameters

Exceptions

Feedback

Feedback

Additional resources