ARM template for Azure stack failing with cse script error

Boopathy, Elango 106 Reputation points
2020-10-07T10:35:01.727+00:00

Hi All,

We have a requirement to create AKS engine in Azure stack environment. Therefore we followed below tutorial to create K8 cluster. and as part of running the command we got output folder with files along with ARM templates.

https://github.com/Azure/aks-engine/blob/master/docs/tutorials/quickstart.md

The input to the aks-engine command line tool is a cluster definition JSON file which contains Subscription, VNET, Subnet, Master count and Node count etc.

aks-engine will generate ARM templates, SSH keys, and a kubeconfig as Outputs and then persist those as local files under the _output/ directory:

So we thought of using this ARM templates as a template reference and using Terraform to automate the K8 cluster creation in other higher environments. However when we change some of the parameters like firstconsecutiveIP (CIDR or IP ranges for Master & worker IPs), VNET, Subnet etc and running the ARM template creates Public IP, VM, Disc, Avaiblity set etc without any issues but failing with Custom script extension only for Master VM. Not sure what excatly this exception but seems like a VM extension is failing with below exception:

{ "id": "/subscriptions/xxxxx-42a4-4f90-xxxxxxxxxxxxxx/resourceGroups/aks-deployment-mdc-terraform-testing/providers/Microsoft.Resources/deployments/acctesttemplate-05/operations/C0BB1EA96DC78E2C", "operationId": "C0BB1EA96DC78E2C", "properties": { "provisioningOperation": "Create", "provisioningState": "Failed", "timestamp": "2020-09-24T15:29:00.4713997Z", "duration": "PT22M39.6171713S", "trackingId": "a713f23a-cbd5-490d-a9a3-d4baa0ea66d6", "statusCode": "Conflict", "statusMessage": { "status": "Failed", "error": { "code": "ResourceDeploymentFailure", "message": "The resource operation completed with terminal provisioning state 'Failed'.", "details": [ { "code": "VMExtensionProvisioningError", "message": "VM has reported a failure when processing extension 'cse-master-0'. Error message: Enable failed: failed to execute command: command terminated with exit status=15\n[stdout]\nThu Sep 24 15:07:06 UTC 2020,k8s-master-43022074-0\n\n[stderr]\n" } ] } }, "targetResource": { "id": "/subscriptions/xxxxx-42a4-4f90-xxxxxxxxxxxxxx/resourceGroups/aks-deployment-mdc-terraform-testing/providers/Microsoft.Compute/virtualMachines/k8s-master-43022074-0/extensions/cse-master-0", "resourceType": "Microsoft.Compute/virtualMachines/extensions", "resourceName": "k8s-master-43022074-0/cse-master-0" } }}

Attached both ARM templates azuredeploy.json and azuredeploy.parameters.json for your review. Kindly have a look and help us in sorting it down.

I have mask Client Id, Client Secret, VNET and Subnet etc in azuredeploy.parameters.json file. Just change it if you would like simulate in your local.

Steps To Reproduce
Attached both ARM templates azuredeploy.json and azuredeploy.parameters.json for your review (in the form of txt files). Kindly have a look and help us in sorting it down.

I have mask Client Id, Client Secret, VNET and Subnet etc in azuredeploy.parameters.json file. Just change it if you would like simulate in your local.

Expected behavior
So we thought of using this ARM templates as a template reference. Changing some of the parameters and run same ARM templates for different environments.

AKS Engine version
v0.43.1
Kubernetes version

  1. 14

Like to know is there any other better alternate solution for same in Azure stack to install AKS.

30580-azuredeploy.txt
30620-azuredeploy-parameters.txt

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,855 questions
Azure Stack Hub
Azure Stack Hub
An extension of Azure for running apps in an on-premises environment and delivering Azure services in a datacenter.
179 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. prmanhas-MSFT 17,886 Reputation points Microsoft Employee
    2020-10-07T12:47:07.853+00:00

    @Boopathy, Elango CSE exit 35 is ERR_CONTAINER_IMG_PULL_TIMEOUT(see for more info).

    It means that the image was missing from the custom image you used and that CSE tried to pull it from source and failed.

    Can you please log onto k8s-agentpool and look at /var/log/azure/cluster-provision.log to check exactly what image pull failed? And then please try pulling the image from that VM manually to see if you can repro the image pull failure.

    Same issue has been reported here as well.

    You can refer to this GitHub thread for more information too.

    This article might be helpful as well.

    Hope it helps!!!

    Please 'Accept as answer' if it helped, so that it can help others in the community looking for help on similar topics