您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

排查 Azure 中的 OpenShift 容器平台3.11 部署问题Troubleshoot OpenShift Container Platform 3.11 deployment in Azure

如果 OpenShift 群集未成功部署,Azure 门户将提供错误输出。If the OpenShift cluster doesn't deploy successfully, the Azure portal will provide error output. 该输出可能难以阅读,导致难以识别问题。The output may be difficult to read which makes it difficult to identify the problem. 请在此输出中快速扫描退出代码 3、4 或 5。Quickly scan this output for exit code 3, 4 or 5. 下面提供了有关这三个退出代码的信息:The following provides information on these three exit codes:

  • 退出代码 3:Red Hat 订阅用户名/密码或组织 ID/激活密钥不正确Exit code 3: Your Red Hat Subscription User Name / Password or Organization ID / Activation Key is incorrect
  • 退出代码 4:Red Hat 池 ID 不正确或无可用授权Exit code 4: Your Red Hat Pool ID is incorrect or there are no entitlements available
  • 退出代码 5:未能预配 Docker 精简池卷Exit code 5: Unable to provision Docker Thin Pool Volume

对于其他所有退出代码,请通过 SSH 连接到主机,以查看日志文件。For all other exit codes, connect to the host(s) via ssh to view the log files.

OpenShift 容器平台 3.11OpenShift Container Platform 3.11

通过 SSH 连接到 ansible playbook 主机。SSH to the ansible playbook host. 对于模板或市场套餐,请使用守护主机。For the template or the Marketplace offer, use the bastion host. 在守护主机中,可以通过 SSH 连接到群集中的其他所有节点(主节点、infra、CNS、计算节点)。From the bastion, you can SSH to all other nodes in the cluster (master, infra, CNS, compute). 需要以 root 用户身份查看日志文件。You'll need to be root to view the log files. 默认禁止 root 进行 SSH 访问,因此,请不要使用 root 身份通过 SSH 连接到其他节点。Root is disabled for SSH access by default so don't use root to SSH to other nodes.

OKDOKD

通过 SSH 连接到 ansible playbook 主机。SSH to the ansible playbook host. 对于 OKD 模板(3.9 和更低版本),请使用 master-0 主机。For the OKD template (version 3.9 and earlier), use the master-0 host. 对于 OKD 模板(3.10 和更高版本),请使用守护主机。For the OKD template (version 3.10 and later), use the bastion host. 在 ansible playbook 主机中,可以通过 SSH 连接到群集中的其他所有节点(主节点、infra、CNS、计算节点)。From the ansible playbook host, you can SSH to all other nodes in the cluster (master, infra, CNS, compute). 需要以 root 用户 (sudo su -) 身份查看日志文件。You'll need to be root (sudo su -) to view the log files. 默认禁止 root 进行 SSH 访问,因此,请不要使用 root 身份通过 SSH 连接到其他节点。Root is disabled for SSH access by default so don't use root to SSH to other nodes.

日志文件Log files

主机准备脚本 (stderr 和 stdout) 的日志文件位于 /var/lib/waagent/custom-script/download/0 所有主机上的。The log files (stderr and stdout) for the host preparation scripts are located in /var/lib/waagent/custom-script/download/0 on all hosts. 如果在准备主机期间出错,请查看这些日志文件以确定错误。If an error occurred during the preparation of the host, view these log files to determine the error.

如果准备脚本成功运行,则 /var/lib/waagent/custom-script/download/1 需要检查 ansible 操作手册宿主目录中的日志文件。If the preparation scripts ran successfully, then the log files in the /var/lib/waagent/custom-script/download/1 directory of the ansible playbook host will need to be examined. 如果在实际安装 OpenShift 期间出错,stdout 文件将显示错误。If the error occurred during the actual installation of OpenShift, the stdout file will display the error. 使用此信息来联系支持人员,以获得进一步的帮助。Use this information to contact Support for further assistance.

示例输出Example output

TASK [openshift_storage_glusterfs : Load heketi topology] **********************
fatal: [mycluster-master-0]: FAILED! => {"changed": true, "cmd": ["oc", "--config=/tmp/openshift-glusterfs-ansible-IbhnUM/admin.kubeconfig", "rsh", "--namespace=glusterfs", "deploy-heketi-storage-1-d9xl5", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "VuoJURT0/96E42Vv8+XHfsFpSS8R20rH1OiMs3OqARQ=", "topology", "load", "--json=/tmp/openshift-glusterfs-ansible-IbhnUM/topology.json", "2>&1"], "delta": "0:00:21.477831", "end": "2018-05-20 02:49:11.912899", "failed": true, "failed_when_result": true, "rc": 0, "start": "2018-05-20 02:48:50.435068", "stderr": "", "stderr_lines": [], "stdout": "Creating cluster ... ID: 794b285745b1c5d7089e1c5729ec7cd2\n\tAllowing file volumes on cluster.\n\tAllowing block volumes on cluster.\n\tCreating node mycluster-cns-0 ... ID: 45f1a3bfc20a4196e59ebb567e0e02b4\n\t\tAdding device /dev/sdd ... OK\n\t\tAdding device /dev/sde ... OK\n\t\tAdding device /dev/sdf ... OK\n\tCreating node mycluster-cns-1 ... ID: 596f80d7bbd78a1ea548930f23135131\n\t\tAdding device /dev/sdc ... Unable to add device: Unable to execute command on glusterfs-storage-4zc42:   Device /dev/sdc excluded by a filter.\n\t\tAdding device /dev/sde ... OK\n\t\tAdding device /dev/sdd ... OK\n\tCreating node mycluster-cns-2 ... ID: 42c0170aa2799559747622acceba2e3f\n\t\tAdding device /dev/sde ... OK\n\t\tAdding device /dev/sdf ... OK\n\t\tAdding device /dev/sdd ... OK", "stdout_lines": ["Creating cluster ... ID: 794b285745b1c5d7089e1c5729ec7cd2", "\tAllowing file volumes on cluster.", "\tAllowing block volumes on cluster.", "\tCreating node mycluster-cns-0 ... ID: 45f1a3bfc20a4196e59ebb567e0e02b4", "\t\tAdding device /dev/sdd ... OK", "\t\tAdding device /dev/sde ... OK", "\t\tAdding device /dev/sdf ... OK", "\tCreating node mycluster-cns-1 ... ID: 596f80d7bbd78a1ea548930f23135131", "\t\tAdding device /dev/sdc ... Unable to add device: Unable to execute command on glusterfs-storage-4zc42:   Device /dev/sdc excluded by a filter.", "\t\tAdding device /dev/sde ... OK", "\t\tAdding device /dev/sdd ... OK", "\tCreating node mycluster-cns-2 ... ID: 42c0170aa2799559747622acceba2e3f", "\t\tAdding device /dev/sde ... OK", "\t\tAdding device /dev/sdf ... OK", "\t\tAdding device /dev/sdd ... OK"]}

PLAY RECAP *********************************************************************
mycluster-cns-0       : ok=146  changed=57   unreachable=0    failed=0   
mycluster-cns-1       : ok=146  changed=57   unreachable=0    failed=0   
mycluster-cns-2       : ok=146  changed=57   unreachable=0    failed=0   
mycluster-infra-0     : ok=143  changed=55   unreachable=0    failed=0   
mycluster-infra-1     : ok=143  changed=55   unreachable=0    failed=0   
mycluster-infra-2     : ok=143  changed=55   unreachable=0    failed=0   
mycluster-master-0    : ok=502  changed=198  unreachable=0    failed=1   
mycluster-master-1    : ok=348  changed=140  unreachable=0    failed=0   
mycluster-master-2    : ok=348  changed=140  unreachable=0    failed=0   
mycluster-node-0      : ok=143  changed=55   unreachable=0    failed=0   
mycluster-node-1      : ok=143  changed=55   unreachable=0    failed=0   
localhost                  : ok=13   changed=0    unreachable=0    failed=0   

INSTALLER STATUS ***************************************************************
Initialization             : Complete (0:00:39)
Health Check               : Complete (0:00:24)
etcd Install               : Complete (0:01:24)
Master Install             : Complete (0:14:59)
Master Additional Install  : Complete (0:01:10)
Node Install               : Complete (0:10:58)
GlusterFS Install          : In Progress (0:03:33)
    This phase can be restarted by running: playbooks/openshift-glusterfs/config.yml

Failure summary:

  1. Hosts:    mycluster-master-0
     Play:     Configure GlusterFS
     Task:     Load heketi topology
     Message:  Failed without returning a message.

安装期间最常见的错误包括:The most common errors during installation are:

  1. 私钥包含通行短语Private key has passphrase
  2. 未正确创建包含私钥的 Key Vault 机密Key vault secret with private key wasn't created correctly
  3. 输入的服务主体凭据不正确Service principal credentials were entered incorrectly
  4. 服务主体对资源组没有参与者访问权限Service principal doesn't have contributor access to the resource group

私钥包含通行短语Private Key has a passphrase

你将看到一个错误,指出拒绝了 ssh 的权限。You'll see an error that permission was denied for ssh. 通过 ssh 连接到 ansible 操作手册主机以检查私钥上的通行短语。ssh to the ansible playbook host to check for a passphrase on the private key.

未正确创建包含私钥的 Key Vault 机密Key vault secret with private key wasn't created correctly

将私钥复制到 ansible 操作手册主机-~/.ssh/id_rsa。The private key is copied into the ansible playbook host - ~/.ssh/id_rsa. 确认此文件正确。Confirm this file is correct. 通过与 ansible playbook 主机中的某个群集节点建立 SSH 会话来进行测试。Test by opening an SSH session to one of the cluster nodes from the ansible playbook host.

输入的服务主体凭据不正确Service principal credentials were entered incorrectly

向模板或市场套餐提供输入时,提供了错误的信息。When providing the input to the template or Marketplace offer, the incorrect information was provided. 请确保为服务主体使用正确的 appId (clientId) 和密码 (clientSecret)。Make sure you use the correct appId (clientId) and password (clientSecret) for the service principal. 通过发出以下 Azure CLI 命令进行验证。Verify by issuing the following azure cli command.

az login --service-principal -u <client id> -p <client secret> -t <tenant id>

服务主体对资源组没有参与者访问权限Service principal doesn't have contributor access to the resource group

如果启用了 Azure 云提供程序,则使用的服务主体必须对资源组拥有参与者访问权限。If the Azure cloud provider is enabled, then the service principal used must have contributor access to the resource group. 通过发出以下 Azure CLI 命令进行验证。Verify by issuing the following azure cli command.

az group update -g <openshift resource group> --set tags.sptest=test

其他工具Additional tools

对于某些错误,还可以使用以下命令获取详细信息:For some errors, you can also use the following commands to get more information:

  1. systemctl status <service>systemctl status <service>
  2. journalctl -xejournalctl -xe