AksEndpoint Class
Note
This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Represents a collection of web service versions behind the same endpoint running on Azure Kubernetes Service.
Whereas a AksWebservice deploys a single service with a single scoring endpoint, the AksEndpoint class enables you to deploy multiple web service versions behind the same scoring endpoint. Each web service version can be configured to serve a percentage of the traffic so you can deploy models in a controlled fashion, for example, for A/B testing. The AksEndpoint allows deployment from a model object similar to AksWebservice.
Initialize the Webservice instance.
The Webservice constructor retrieves a cloud representation of a Webservice object associated with the provided workspace. It will return an instance of a child class corresponding to the specific type of the retrieved Webservice object.
- Inheritance
-
AksEndpoint
Constructor
AksEndpoint(workspace, name)
Parameters
Variables
- versions
- dict[str, AksWebservice]
A dictionary of version name to version object. Contains all of the versions deployed as a part of this Endpoint.
Methods
create_version |
Add a new version in an Endpoint with provided properties. |
delete_version |
Delete a version in an Endpoint. |
deploy_configuration |
Create a configuration object for deploying to an AKS compute target. |
serialize |
Convert this Webservice into a JSON serialized dictionary. |
update |
Update the Endpoint with provided properties. Values left as None will remain unchanged in this Endpoint |
update_version |
Update an existing version in an Endpoint with provided properties. Values left as None will remain unchanged in this version. |
create_version
Add a new version in an Endpoint with provided properties.
create_version(version_name, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, cpu_cores=None, memory_gb=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, traffic_percentile=None, is_default=None, is_control_version_type=None, cpu_cores_limit=None, memory_gb_limit=None)
Parameters
- autoscale_enabled
- bool
Whether or not to enable autoscaling for this version in an Endpoint.
Defaults to True if num_replicas
is None.
- autoscale_min_replicas
- int
The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1
- autoscale_max_replicas
- int
The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10
- autoscale_refresh_seconds
- int
How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1
- autoscale_target_utilization
- int
The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70
- collect_model_data
- bool
Whether or not to enable model data collection for this version in an Endpoint. Defaults to False
- cpu_cores
- float
The number of CPU cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1
- memory_gb
- float
The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5
- scoring_timeout_ms
- int
A timeout to enforce for scoring calls to this version in an Endpoint. Defaults to 60000.
- replica_max_concurrent_requests
- int
The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.
- max_request_wait_time
- int
The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.
- num_replicas
- int
The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default.
Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added.
- inference_config
- InferenceConfig
An InferenceConfig object used to provide the required model deployment properties.
- gpu_cores
- int
The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0.
- period_seconds
- int
How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.
- initial_delay_seconds
- int
The number of seconds after the container has started before liveness probes are initiated. Defaults to 310.
- timeout_seconds
- int
The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.
- success_threshold
- int
The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.
- failure_threshold
- int
When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.
- traffic_percentile
- float
The amount of traffic the version takes in an endpoint.
- is_default
- bool
Whether or not to make this version as default version in an Endpoint. Defaults to False.
- is_control_version_type
- bool
Whether or not to make this version as control version in an Endpoint. Defaults to False.
- cpu_cores_limit
- float
The max number of cpu cores this Webservice is allowed to use. Can be a decimal.
- memory_gb_limit
- float
The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.
Exceptions
delete_version
Delete a version in an Endpoint.
delete_version(version_name)
Parameters
Exceptions
deploy_configuration
Create a configuration object for deploying to an AKS compute target.
static deploy_configuration(autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, auth_enabled=None, cpu_cores=None, memory_gb=None, enable_app_insights=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, primary_key=None, secondary_key=None, tags=None, properties=None, description=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, namespace=None, token_auth_enabled=None, version_name=None, traffic_percentile=None, compute_target_name=None, cpu_cores_limit=None, memory_gb_limit=None)
Parameters
- autoscale_enabled
- bool
Whether or not to enable autoscaling for this version in an Endpoint.
Defaults to True if num_replicas
is None.
- autoscale_min_replicas
- int
The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1.
- autoscale_max_replicas
- int
The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10.
- autoscale_refresh_seconds
- int
How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1.
- autoscale_target_utilization
- int
The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70.
- collect_model_data
- bool
Whether or not to enable model data collection for this version in an Endpoint. Defaults to False.
- auth_enabled
- bool
Whether or not to enable key auth for this version in an Endpoint. Defaults to True.
- cpu_cores
- float
The number of cpu cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1
- memory_gb
- float
The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5
- enable_app_insights
- bool
Whether or not to enable ApplicationInsights logging for this version in an Endpoint. Defaults to False.
- scoring_timeout_ms
- int
A timeout to enforce scoring calls to this version in an Endpoint. Defaults to 60000
- replica_max_concurrent_requests
- int
The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.
- max_request_wait_time
- int
The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.
- num_replicas
- int
The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default.
Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added
- gpu_cores
- int
The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0.
- period_seconds
- int
How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.
- initial_delay_seconds
- int
Number of seconds after the container has started before liveness probes are initiated. Defaults to 310.
- timeout_seconds
- int
Number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.
- success_threshold
- int
Minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.
- failure_threshold
- int
When a Pod starts and the liveness probe fails, Kubernetes will try
failureThreshold
times before giving up. Defaults to 3. Minimum value is 1.
- namespace
- str
The Kubernetes namespace in which to deploy this Endpoint: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first and last characters cannot be hyphens.
- token_auth_enabled
- bool
Whether or not to enable Token auth for this Endpoint. If this is enabled, users can access this Endpoint by fetching access token using their Azure Active Directory credentials. Defaults to False.
- traffic_percentile
- float
the amount of traffic the version takes in an endpoint.
- cpu_cores_limit
- float
The max number of cpu cores this Webservice is allowed to use. Can be a decimal.
- memory_gb_limit
- float
The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.
Return type
Exceptions
serialize
Convert this Webservice into a JSON serialized dictionary.
serialize()
Returns
The JSON representation of this Webservice.
Return type
Exceptions
update
Update the Endpoint with provided properties.
Values left as None will remain unchanged in this Endpoint
update(auth_enabled=None, token_auth_enabled=None, enable_app_insights=None, description=None, tags=None, properties=None)
Parameters
- auth_enabled
- bool
Whether or not to enable key auth for this version in an Endpoint. Defaults to True.
- token_auth_enabled
- bool
Whether or not to enable Token auth for this Endpoint. If this is enabled, users can access this Endpoint by fetching access token using their Azure Active Directory credentials. Defaults to False.
- enable_app_insights
- bool
Whether or not to enable Application Insights logging for this version in an Endpoint. Defaults to False.
Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added.
Exceptions
update_version
Update an existing version in an Endpoint with provided properties.
Values left as None will remain unchanged in this version.
update_version(version_name, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, cpu_cores=None, memory_gb=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, traffic_percentile=None, is_default=None, is_control_version_type=None, cpu_cores_limit=None, memory_gb_limit=None)
Parameters
- autoscale_enabled
- bool
Whether or not to enable autoscaling for this version in an Endpoint. Defaults to True if num_replicas is None.
- autoscale_min_replicas
- int
The minimum number of containers to use when autoscaling this version in an Endpoint. Defaults to 1.
- autoscale_max_replicas
- int
The maximum number of containers to use when autoscaling this version in an Endpoint. Defaults to 10.
- autoscale_refresh_seconds
- int
How often the autoscaler should attempt to scale this version in an Endpoint. Defaults to 1
- autoscale_target_utilization
- int
The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this version in an Endpoint. Defaults to 70.
- collect_model_data
- bool
Whether or not to enable model data collection for this version in an Endpoint. Defaults to False.
- cpu_cores
- float
The number of cpu cores to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.1
- memory_gb
- float
The amount of memory (in GB) to allocate for this version in an Endpoint. Can be a decimal. Defaults to 0.5
- scoring_timeout_ms
- int
A timeout to enforce for scoring calls to this version in an Endpoint. Defaults to 60000.
- replica_max_concurrent_requests
- int
The number of maximum concurrent requests per replica to allow for this version in an Endpoint. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.
- max_request_wait_time
- int
The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.
- num_replicas
- int
The number of containers to allocate for this version in an Endpoint. No default, if this parameter is not set then the autoscaler is enabled by default.
Dictionary of key value properties to give this Endpoint. These properties cannot be changed after deployment, however new key value pairs can be added.
- inference_config
- InferenceConfig
An InferenceConfig object used to provide the required model deployment properties.
- gpu_cores
- int
The number of GPU cores to allocate for this version in an Endpoint. Defaults to 0.
- period_seconds
- int
How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.
- initial_delay_seconds
- int
The number of seconds after the container has started before liveness probes are initiated. Defaults to 310.
- timeout_seconds
- int
The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.
- success_threshold
- int
The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.
- failure_threshold
- int
When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.
- traffic_percentile
- float
The amount of traffic the version takes in an endpoint.
- is_default
- bool
Whether or not to make this version as default version in an Endpoint. Defaults to False.
- is_control_version_type
- bool
Whether or not to make this version as control version in an Endpoint. Defaults to False.
- cpu_cores_limit
- float
The max number of cpu cores this Webservice is allowed to use. Can be a decimal.
- memory_gb_limit
- float
The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.
Exceptions
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for