Troubleshoot AKS engine on Azure Stack Hub

You may find an issue when deploying or working with AKS engine on Azure Stack Hub. This article looks at the steps to troubleshoot your deployment of AKS engine. Collect information about your AKS engine, collect Kubernetes logs, and review custom script extension error codes. You can also open a GitHub issue for AKS engine.

Note

For AKSe version 0.75.3 and above, the aks-engine commands below will begin with aks-engine-azurestack rather than aks-engine.

Troubleshoot the AKS engine install

If your previous installation steps failed, you can install AKS engine using the GoFish package manager. GoFish describes itself as a cross-platform Homebrew.

You can find instructions for using GoFish to install the AKS engine here.

Collect node and cluster logs

You can find the instructions on collecting node and cluster logs at Retrieving Node and Cluster Logs.

Prerequisites

This guide assumes you've already downloaded the Azure CLI and the AKS engine.

This guide also assumes that you've deployed a cluster using AKS engine. For more information, see Deploy a Kubernetes cluster with AKS engine on Azure Stack Hub .

Retrieving logs

The aks-engine get-logs command can be useful to troubleshoot issues with your cluster. The command produces, collects, and downloads a set of files to your workstation. The files include node configuration, cluster state and configuration, and set up log files.

At a high level: the command works by establishing an SSH session into each node, executing a log collection script that collects and zips relevant files, and downloading the .ZIP file to your local computer.

SSH authentication

You'll need a valid SSH private key to establish an SSH session to the cluster Linux nodes. Windows credentials are stored in the API model and will be loaded from there. Set windowsprofile.sshEnabled to true to enable SSH in your Windows nodes.

Upload logs to a storage account container

Once the cluster logs were successfully retrieved, AKS Engine can save them on an Azure Storage Account container if optional parameter --upload-sas-url is set. AKS Engine expects the container name to be part of the provided SAS URL. The expected format is https://{blob-service-uri}/{container-name}?{sas-token}.

Note

Storage accounts on custom clouds using the AD FS identity provider are not yet supported.

Nodes unable to join the cluster

By default, aks-engine get-logs collects logs from nodes that successfully joined the cluster. To collect logs from VMs that weren't able to join the cluster, set flag --vm-names:

--vm-name k8s-pool-01,k8s-pool-02

Usage for aks-engine get-logs

Assuming that you have a cluster deployed and the API model originally used to deploy that cluster is stored at _output/<dnsPrefix>/apimodel.json, then you can collect logs running a command like:

aks-engine get-logs \
    --location <location> \
    --api-model _output/<dnsPrefix>/apimodel.json \
    --ssh-host <dnsPrefix>.<location>.cloudapp.azure.com \
    --linux-ssh-private-key ~/.ssh/id_rsa

Parameters

Parameter Required Description
--location Yes Azure location of the cluster's resource group.
--api-model Yes Path to the generated API model for the cluster.
--ssh-host Yes FQDN, or IP address, of an SSH listener that can reach all nodes in the cluster.
--linux-ssh-private-key Yes Path to an SSH private key that can be used to create a remote session on the cluster Linux nodes.
--output-directory No Output directory, derived from --api-model if missing.
--control-plane-only No Only collect logs from control plane nodes.
--vm-names No Only collect logs from the specified VMs (comma-separated names).
--upload-sas-url No Azure Storage Account SAS URL to upload the collected logs.

Review custom script extension error codes

The AKS engine produces a script for each Ubuntu Server as a resource for the custom script extension (CSE) to perform deployment tasks. If the script throws an error, it will log an error in /var/log/azure/cluster-provision.log. The errors are displayed in the portal. The error code may be helpful in figuring out the case of the problem. For more information about the CSE exit codes, see cse_helpers.sh.

Providing Kubernetes logs to a Microsoft support engineer

If after collecting and examining logs you still can't resolve your issue, you may want to start the process of creating a support ticket and provide the logs that you collected.

Your operator may combine the logs you produced along with other system logs that may be needed by Microsoft support. The operator may make them available to the Microsoft.

You can provide Kubernetes logs in several ways:

  • You can contact your Azure Stack Hub operator. Your operator uses the information from the logs stored in the .ZIP file to create the support case.
  • If you have the SAS URL for a storage account where you can upload your Kubernetes logs, you can include the following command and flag with the SAS URL to save the logs to the storage account:
    aks-engine get-logs -upload-sas-url <SAS-URL>
    
    For instructions, see Upload logs to a storage account container.
  • If you're a cloud operator, you can:

Open GitHub issues

If you're unable to resolve your deployment error, you can open a GitHub Issue.

  1. Open a GitHub Issue in the AKS engine repository.

  2. Add a title using the following format: CSE error: exit code <INSERT_YOUR_EXIT_CODE>.

  3. Include the following information in the issue:

    • The cluster configuration file, apimodel.json, used to deploy the cluster. Remove all secrets and keys before posting it on GitHub.

    • The output of the following kubectl command get nodes.

    • The content of /var/log/azure/cluster-provision.log from an unhealthy node.

Next steps