AksEndpoint Class

Represents a collection of web service versions behind the same endpoint running on Azure Kubernetes Service.

Whereas a AksWebservice deploys a single service with a single scoring endpoint, the AksEndpoint class enables you to deploy multiple web service versions behind the same scoring endpoint. Each web service version can be configured to serve a percentage of the traffic so you can deploy models in a controlled fashion, for example, for A/B testing. The AksEndpoint allows deployment from a model object similar to AksWebservice.

Inheritance
AksEndpoint

Constructor

AksEndpoint(workspace, name)

Variables

versions
dict[str, AksWebservice]

A dictionary of version name to version object. Contains all of the versions deployed as a part of this Endpoint.

Methods

create_version

Add a new version in an Endpoint with provided properties.

delete_version

Delete a version in an Endpoint.

deploy_configuration

Create a configuration object for deploying to an AKS compute target.

serialize

Convert this Webservice into a JSON serialized dictionary.

update

Update the Endpoint with provided properties.

Values left as None will remain unchanged in this Endpoint

update_version

Update an existing version in an Endpoint with provided properties.

Values left as None will remain unchanged in this version.

create_version

Add a new version in an Endpoint with provided properties.

create_version(version_name, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, cpu_cores=None, memory_gb=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, traffic_percentile=None, is_default=None, is_control_version_type=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

version_name
str

The name of the version to add in an endpoint.

autoscale_enabled
bool
default value: None

Whether or not to enable autoscaling for this version in an Endpoint. Defaults to True if num_replicas is None.

autoscale_min_replicas
int
default value: None

The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1

autoscale_max_replicas
int
default value: None

The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10

autoscale_refresh_seconds
int
default value: None

How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1

autoscale_target_utilization
int
default value: None

The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70

collect_model_data
bool
default value: None

Whether or not to enable model data collection for this version in an Endpoint. Defaults to False

cpu_cores
float
default value: None

The number of CPU cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1

memory_gb
float
default value: None

The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5

scoring_timeout_ms
int
default value: None

A timeout to enforce for scoring calls to this version in an Endpoint. Defaults to 60000.

replica_max_concurrent_requests
int
default value: None

The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.

max_request_wait_time
int
default value: None

The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.

num_replicas
int
default value: None

The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default.

tags
dict[str, str]
default value: None

Dictionary of key value tags to give this Endpoint.

properties
dict[str, str]
default value: None

Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added.

description
str
default value: None

A description to give this Endpoint.

models
list[Model]
default value: None

A list of Model objects to package with the updated service.

inference_config
InferenceConfig
default value: None

An InferenceConfig object used to provide the required model deployment properties.

gpu_cores
int
default value: None

The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0.

period_seconds
int
default value: None

How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.

initial_delay_seconds
int
default value: None

The number of seconds after the container has started before liveness probes are initiated. Defaults to 310.

timeout_seconds
int
default value: None

The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.

success_threshold
int
default value: None

The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.

failure_threshold
int
default value: None

When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.

traffic_percentile
float
default value: None

The amount of traffic the version takes in an endpoint.

is_default
bool
default value: None

Whether or not to make this version as default version in an Endpoint. Defaults to False.

is_control_version_type
bool
default value: None

Whether or not to make this version as control version in an Endpoint. Defaults to False.

cpu_cores_limit
float
default value: None

The max number of cpu cores this Webservice is allowed to use. Can be a decimal.

memory_gb_limit
float
default value: None

The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.

Exceptions

delete_version

Delete a version in an Endpoint.

delete_version(version_name)

Parameters

version_name
str

The name of the version in an endpoint to delete.

Exceptions

deploy_configuration

Create a configuration object for deploying to an AKS compute target.

deploy_configuration(autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, auth_enabled=None, cpu_cores=None, memory_gb=None, enable_app_insights=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, primary_key=None, secondary_key=None, tags=None, properties=None, description=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, namespace=None, token_auth_enabled=None, version_name=None, traffic_percentile=None, compute_target_name=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

autoscale_enabled
bool
default value: None

Whether or not to enable autoscaling for this version in an Endpoint. Defaults to True if num_replicas is None.

autoscale_min_replicas
int
default value: None

The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1.

autoscale_max_replicas
int
default value: None

The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10.

autoscale_refresh_seconds
int
default value: None

How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1.

autoscale_target_utilization
int
default value: None

The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70.

collect_model_data
bool
default value: None

Whether or not to enable model data collection for this version in an Endpoint. Defaults to False.

auth_enabled
bool
default value: None

Whether or not to enable key auth for this version in an Endpoint. Defaults to True.

cpu_cores
float
default value: None

The number of cpu cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1

memory_gb
float
default value: None

The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5

enable_app_insights
bool
default value: None

Whether or not to enable ApplicationInsights logging for this version in an Endpoint. Defaults to False.

scoring_timeout_ms
int
default value: None

A timeout to enforce scoring calls to this version in an Endpoint. Defaults to 60000

replica_max_concurrent_requests
int
default value: None

The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.

max_request_wait_time
int
default value: None

The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.

num_replicas
int
default value: None

The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default.

primary_key
str
default value: None

A primary auth key to use for this Endpoint.

secondary_key
str
default value: None

A secondary auth key to use for this Endpoint.

tags
dict[str, str]
default value: None

Dictionary of key value tags to give this Endpoint.

properties
dict[str, str]
default value: None

Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added

description
str
default value: None

A description to give this Endpoint.

gpu_cores
int
default value: None

The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0.

period_seconds
int
default value: None

How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.

initial_delay_seconds
int
default value: None

Number of seconds after the container has started before liveness probes are initiated. Defaults to 310.

timeout_seconds
int
default value: None

Number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.

success_threshold
int
default value: None

Minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.

failure_threshold
int
default value: None

When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.

namespace
str
default value: None

The Kubernetes namespace in which to deploy this Endpoint: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first and last characters cannot be hyphens.

token_auth_enabled
bool
default value: None

Whether or not to enable Token auth for this Endpoint. If this is enabled, users can access this Endpoint by fetching access token using their Azure Active Directory credentials. Defaults to False.

version_name
str
default value: None

The name of the version in an endpoint.

traffic_percentile
float
default value: None

the amount of traffic the version takes in an endpoint.

compute_target_name
str
default value: None

The name of the compute target to deploy to

cpu_cores_limit
float
default value: None

The max number of cpu cores this Webservice is allowed to use. Can be a decimal.

memory_gb_limit
float
default value: None

The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.

Return type

Exceptions

serialize

Convert this Webservice into a JSON serialized dictionary.

serialize()

Returns

The JSON representation of this Webservice.

Return type

Exceptions

update

Update the Endpoint with provided properties.

Values left as None will remain unchanged in this Endpoint

update(auth_enabled=None, token_auth_enabled=None, enable_app_insights=None, description=None, tags=None, properties=None)

Parameters

auth_enabled
bool
default value: None

Whether or not to enable key auth for this version in an Endpoint. Defaults to True.

token_auth_enabled
bool
default value: None

Whether or not to enable Token auth for this Endpoint. If this is enabled, users can access this Endpoint by fetching access token using their Azure Active Directory credentials. Defaults to False.

enable_app_insights
bool
default value: None

Whether or not to enable Application Insights logging for this version in an Endpoint. Defaults to False.

description
str
default value: None

A description to give this Endpoint.

tags
dict[str, str]
default value: None

Dictionary of key value tags to give this Endpoint.

properties
dict[str, str]
default value: None

Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added.

Exceptions

update_version

Update an existing version in an Endpoint with provided properties.

Values left as None will remain unchanged in this version.

update_version(version_name, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, cpu_cores=None, memory_gb=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, traffic_percentile=None, is_default=None, is_control_version_type=None, cpu_cores_limit=None, memory_gb_limit=None)

Parameters

version_name
str

The name of the version in an endpoint.

autoscale_enabled
bool
default value: None

Whether or not to enable autoscaling for this version in an Endpoint. Defaults to True if num_replicas is None.

autoscale_min_replicas
int
default value: None

The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1.

autoscale_max_replicas
int
default value: None

The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10.

autoscale_refresh_seconds
int
default value: None

How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1

autoscale_target_utilization
int
default value: None

The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70.

collect_model_data
bool
default value: None

Whether or not to enable model data collection for this version in an Endpoint. Defaults to False.

cpu_cores
float
default value: None

The number of cpu cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1

memory_gb
float
default value: None

The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5

scoring_timeout_ms
int
default value: None

A timeout to enforce for scoring calls to this version in an Endpoint. Defaults to 60000.

replica_max_concurrent_requests
int
default value: None

The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.

max_request_wait_time
int
default value: None

The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.

num_replicas
int
default value: None

The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default.

tags
dict[str, str]
default value: None

Dictionary of key value tags to give this Endpoint.

properties
dict[str, str]
default value: None

Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added.

description
str
default value: None

A description to give this Endpoint

models
list[Model]
default value: None

A list of Model objects to package with the updated service

inference_config
InferenceConfig
default value: None

An InferenceConfig object used to provide the required model deployment properties.

gpu_cores
int
default value: None

The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0.

period_seconds
int
default value: None

How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.

initial_delay_seconds
int
default value: None

The number of seconds after the container has started before liveness probes are initiated. Defaults to 310.

timeout_seconds
int
default value: None

The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.

success_threshold
int
default value: None

The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.

failure_threshold
int
default value: None

When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.

traffic_percentile
float
default value: None

The amount of traffic the version takes in an endpoint.

is_default
bool
default value: None

Whether or not to make this version as default version in an Endpoint. Defaults to False.

is_control_version_type
bool
default value: None

Whether or not to make this version as control version in an Endpoint. Defaults to False.

cpu_cores_limit
float
default value: None

The max number of cpu cores this Webservice is allowed to use. Can be a decimal.

memory_gb_limit
float
default value: None

The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.

Exceptions