您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure Kubernetes 服务 (AKS) 中的微服务体系结构

Azure Active Directory
容器注册表
Kubernetes 服务
负载均衡器
监视
管道

此参考体系结构显示了部署到 AKS Azure Kubernetes 服务 (的微) 。This reference architecture shows a microservices application deployed to Azure Kubernetes Service (AKS). 它描述了一个基本的 AKS 配置,该配置可以是大多数部署的起点。It describes a basic AKS configuration that can be the starting point for most deployments. 本文假设读者基本了解 Kubernetes。This article assumes basic knowledge of Kubernetes. 本文侧重于有关在 AKS 中运行微服务体系结构的基础结构和 DevOps 注意事项。The article focuses mainly on the infrastructure and DevOps considerations of running a microservices architecture on AKS. 有关如何设计微服务的指南,请参阅在 Azure 上构建微服务For guidance on how to design microservices, see Building microservices on Azure.

GitHub 徽标 GitHub上提供了此体系结构的参考实现。GitHub logo A reference implementation of this architecture is available on GitHub.

AKS 参考体系结构

下载此体系结构的 Visio 文件Download a Visio file of this architecture.

如果希望查看基于AKS基线体系结构构建的更高级微服务示例,请参阅AKS Azure Kubernetes 服务 (高级) 体系结构If you would prefer to see a more advanced microservices example that is built upon the AKS Baseline architecture, see Advanced Azure Kubernetes Service (AKS) microservices architecture

组件Components

该体系结构包括以下组件。The architecture consists of the following components.

Azure Kubernetes 服务 ( AKS) 。Azure Kubernetes Service (AKS). AKS 是托管在 Azure 云中的 Kubernetes 群集。AKS is a managed Kubernetes cluster hosted in the Azure cloud. 使用 AKS 时,Azure 会管理 Kubernetes API 服务,并且只需管理代理节点。When using AKS, Azure manages the Kubernetes API service, and you only need to manage the agent nodes.

虚拟网络Virtual network. 默认情况下,AKS 会创建代理节点连接到的虚拟网络。By default, AKS creates a virtual network into which agent nodes are connected. 可以先为更高级的方案创建虚拟网络,从而控制子网配置、本地连接和 IP 寻址等。You can create the virtual network first for more advanced scenarios, which lets you control things like subnet configuration, on-premises connectivity, and IP addressing. 有关详细信息,请参阅在 Azure Kubernetes 服务 (AKS) 中配置高级网络For more information, see Configure advanced networking in Azure Kubernetes Service (AKS).

入口Ingress. 入口服务器公开 HTTP (S) 路由到群集内的服务。An ingress server exposes HTTP(S) routes to services inside the cluster. 有关详细信息,请参阅下面的 API 网关部分。For more information, see the section API Gateway below.

Azure 负载均衡器Azure Load Balancer. 创建 AKS 群集后,群集即可使用负载均衡器。After creating an AKS cluster, the cluster is ready to use the load balancer. 然后,部署 NGINX 服务后,负载均衡器将配置一个新的公共 IP,该 IP 将位于入口控制器前面。Then, once the NGINX service is deployed, the load balancer will be configured with a new public IP that will front your ingress controller. 这样,负载均衡器就将 Internet 流量路由到入口。This way, the load balancer routes internet traffic to the ingress.

外部数据存储External data stores. 微服务通常是无状态的,会将状态写入 Azure SQL 数据库或 Cosmos DB 等外部数据存储。Microservices are typically stateless and write state to external data stores, such as Azure SQL Database or Cosmos DB.

Azure Active DirectoryAzure Active Directory. AKS 使用 Azure Active Directory (Azure AD) 标识来创建和管理 Azure 负载均衡器等其他 Azure 资源。AKS uses an Azure Active Directory (Azure AD) identity to create and manage other Azure resources such as Azure load balancers. 对于客户端应用程序中的用户身份验证,也建议使用 Azure AD。Azure AD is also recommended for user authentication in client applications.

Azure 容器注册表Azure Container Registry. 使用容器注册表来存储部署到群集中的专用 Docker 映像。Use Container Registry to store private Docker images, which are deployed to the cluster. AKS 可以使用 Azure AD 标识在容器注册表中进行身份验证。AKS can authenticate with Container Registry using its Azure AD identity. 请注意,AKS 不需要 Azure 容器注册表。Note that AKS does not require Azure Container Registry. 可以使用其他容器注册表,例如 Docker 中心。You can use other container registries, such as Docker Hub.

Azure PipelinesAzure Pipelines. Azure Pipelines是自动化Azure DevOps Services生成、测试和部署的一部分。Azure Pipelines are part of the Azure DevOps Services and run automated builds, tests, and deployments. 也可以使用 Jenkins 等第三方 CI/CD 解决方案。You can also use third-party CI/CD solutions such as Jenkins.

HelmHelm. Helm 是 Kubernetes 的包管理器,用于将 Kubernetes 对象捆绑和通用化为可以发布、部署、版本控制和更新的单个单元。Helm is a package manager for Kubernetes, a way to bundle and generalize Kubernetes objects into a single unit that can be published, deployed, versioned, and updated.

Azure MonitorAzure Monitor. Azure Monitor 收集和存储 Azure 服务的指标和日志、应用程序遥测和平台指标。Azure Monitor collects and stores metrics and logs, application telemetry, and platform metrics for the Azure services. 使用此数据监视应用程序、设置警报和仪表板,并对失败执行根本原因分析。Use this data to monitor the application, set up alerts, dashboards, and perform root cause analysis of failures. Azure Monitor 与 AKS 集成,从控制器、节点和容器收集指标。Azure Monitor integrates with AKS to collect metrics from controllers, nodes, and containers.

设计注意事项Design considerations

此参考体系结构集中在微服务体系结构上,但许多建议的做法适用于在 AKS 上运行的其他工作负荷。This reference architecture is focused on microservices architectures, although many of the recommended practices apply to other workloads running on AKS.

微服务Microservices

微服务是松散耦合的、可独立部署的代码单元。A microservice is a loosely coupled, independently deployable unit of code. 微服务通常通过定义完善的 Api 进行通信,可通过某种形式的服务发现来发现。Microservices typically communicate through well-defined APIs and are discoverable through some form of service discovery. 即使在 pod 四处移动时,服务也应始终可访问。The service should always be reachable even when the pods move around. Kubernetes 服务 对象是在 Kubernetes 中对微服务进行建模的自然方式。The Kubernetes Service object is a natural way to model microservices in Kubernetes.

API 网关API gateway

API 网关是一种通用的微服务设计模式API gateways are a general microservices design pattern. API 网关 位于外部客户端与微服务之间。An API gateway sits between external clients and the microservices. 它充当反向代理,将来自客户端的请求路由到微服务。It acts as a reverse proxy, routing requests from clients to microservices. 它还可以执行各种跨切削任务,如身份验证、SSL 终止和速率限制。It may also perform various cross-cutting tasks such as authentication, SSL termination, and rate-limiting. 有关详细信息,请参阅:For more information, see:

在 Kubernetes 中,API 网关的功能主要由 入口控制器 处理。In Kubernetes, the functionality of an API gateway is primarily handled by an Ingress controller. 相关注意事项在 入口 部分中进行了介绍。The considerations are described in the Ingress section.

数据存储Data storage

在微服务体系结构中,服务不应共享数据存储解决方案。In a microservices architecture, services should not share data storage solutions. 每个服务都应管理自己的数据集,以避免服务之间的隐藏依赖项。Each service should manage its own data set to avoid hidden dependencies among services. 数据分离有助于避免服务之间发生意外耦合,这种情况可能会在服务共享相同的基础数据架构时出现。Data separation helps avoid unintentional coupling between services, which can happen when services share the same underlying data schemas. 此外,当服务管理自身的数据存储时,可以使用适当的数据存储来满足特定的要求。Also, when services manage their own data stores, they can use the right data store for their particular requirements.

有关详细信息,请参阅 设计微服务:数据注意事项For more information, see Designing microservices: Data considerations.

请避免在本地群集存储中存储永久数据,因为这会将数据与节点进行联系。Avoid storing persistent data in local cluster storage because that ties the data to the node. 请改用外部服务,如 Azure SQL 数据库或 Cosmos DB。Instead, use an external service such as Azure SQL Database or Cosmos DB. 另一种方法是使用 Azure 磁盘或 Azure 文件将永久性数据卷装载到解决方案。Another option is to mount a persistent data volume to a solution using Azure Disks or Azure Files.

有关详细信息,请参阅 Azure Kubernetes Service 中应用程序的存储选项For more information, see Storage options for application in Azure Kubernetes Service.

服务对象Service object

Kubernetes 服务 对象提供一组功能,这些功能符合微服务对服务可发现性的要求:The Kubernetes Service object provides a set of capabilities that match the microservices requirements for service discoverability:

  • IP 地址。IP address. 服务对象为一组 pod(副本集)提供静态内部 IP 地址。The Service object provides a static internal IP address for a group of pods (ReplicaSet). 创建或移动 pod 时,始终可以通过此内部 IP 地址访问服务。As pods are created or moved around, the service is always reachable at this internal IP address.

  • 负载均衡。Load balancing. 发送到服务 IP 地址的流量在 pod 中进行负载均衡。Traffic sent to the service's IP address is load balanced to the pods.

  • 服务发现。Service discovery. Kubernetes DNS 服务为服务分配内部 DNS 条目。Services are assigned internal DNS entries by the Kubernetes DNS service. 这意味着,API 网关可以使用 DNS 名称调用后端服务。That means the API gateway can call a backend service using the DNS name. 可以使用相同的机制进行服务间的通信。The same mechanism can be used for service-to-service communication. DNS 条目按命名空间进行组织,因此,如果命名空间对应于边界上下文,则服务的 DNS 名称将以自然方式映射到应用程序域。The DNS entries are organized by namespace, so if your namespaces correspond to bounded contexts, then the DNS name for a service will map naturally to the application domain.

下图显示了服务和 Pod 之间的概念关系。The following diagram shows the conceptual relation between services and pods. 到终结点 IP 地址和端口的实际映射由 Kubernetes 网络代理 kube-proxy 执行。The actual mapping to endpoint IP addresses and ports is done by kube-proxy, the Kubernetes network proxy.

服务和 pod

流入量Ingress

在 Kubernetes 中 ,入口控制器 可能实现 API 网关模式。In Kubernetes, the Ingress controller might implement the API gateway pattern. 在这种情况下,入口和****入口 控制器结合使用可提供以下功能:In that case, Ingress and Ingress controller work in conjunction to provide these features:

  • 将客户端请求路由到正确的后端服务。Route client requests to the right backend services. 这样可为客户端提供单一终结点,帮助将客户端与服务分开。This provides a single endpoint for clients, and helps to decouple clients from services.

  • 将多个请求聚合为单个请求,以减少客户端和后端之间的通信。Aggregate multiple requests into a single request, to reduce chattiness between the client and the backend.

  • 卸载后端服务的功能,例如 SSL 终止、身份验证、IP 限制或客户端速率限制 (限制) 。Offload functionality from the backend services, such as SSL termination, authentication, IP restrictions, or client rate limiting (throttling).

入口提取代理服务器的配置设置。Ingress abstracts the configuration settings for a proxy server. 还需要一个入口控制器,该控制器提供入口的基础实现。You also need an Ingress controller, which provides the underlying implementation of the Ingress. 有用于 Nginx、HAProxy、Traefik 和 Azure 应用程序网关 等的入口控制器。There are Ingress controllers for Nginx, HAProxy, Traefik, and Azure Application Gateway, among others.

可以通过不同的技术实现入口资源。The Ingress resource can be fulfilled by different technologies. 若要协同工作,需要将其部署为群集内的入口控制器。To work together, they need to be deployed as the Ingress controller inside the cluster. 它作为边缘路由器或反向代理运行。It operates as the edge router or reverse proxy. 反向代理服务器是潜在的瓶颈或单一故障点,因此,应至少部署两个副本以实现高可用性。A reverse proxy server is a potential bottleneck or single point of failure, so always deploy at least two replicas for high availability.

通常,配置代理服务器需要复杂的文件,如果你不是专家,则这些文件可能难以优化。Often, configuring the proxy server requires complex files, which can be hard to tune if you aren't an expert. 因此,入口控制器提供了很好的抽象。So, the Ingress controller provides a nice abstraction. 入口控制器还有权访问 Kubernetes API,因此它可以做出有关路由和负载均衡的智能决策。The Ingress controller also has access to the Kubernetes API, so it can make intelligent decisions about routing and load balancing. 例如,Nginx 入口控制器可绕过 kube-proxy 网络代理。For example, the Nginx ingress controller bypasses the kube-proxy network proxy.

另一方面,如果你想要对设置拥有完全控制权,则可能需要绕过这种抽象,并手动配置代理服务器。On the other hand, if you need complete control over the settings, you may want to bypass this abstraction and configure the proxy server manually. 有关详细信息,请参阅将 Nginx 或 HAProxy 部署到 KubernetesFor more information, see Deploying Nginx or HAProxy to Kubernetes.

对于 AKS,还可使用应用程序Azure 应用程序网关入口控制器 来使用For AKS, you can also use Azure Application Gateway, using the Application Gateway Ingress Controller. 此选项要求在配置 AKS 群集时启用 CNI 网络,因为应用程序网关部署到 AKS 虚拟网络的子网中。This option requires CNI networking to be enabled when you configure the AKS cluster, because Application Gateway is deployed into a subnet of the AKS virtual network. Azure 应用程序网关可以执行第7层路由和 SSL 终止。Azure Application Gateway can perform layer-7 routing and SSL termination. 它还为 web 应用程序防火墙 (WAF) 提供内置支持。It also has built-in support for web application firewall (WAF).

有关 Azure 中的负载均衡服务的信息,请参阅 azure 中的负载平衡选项概述For information about load-balancing services in Azure, see Overview of load-balancing options in Azure.

TLS/SSL 加密TLS/SSL encryption

在常见实现中,入口控制器用于 SSL 终止。In common implementations, the Ingress controller is used for SSL termination. 因此,在部署入口控制器的过程中,你需要创建一个 TLS 证书。So, as part of deploying the Ingress controller, you need to create a TLS certificate. 仅将自签名证书用于开发/测试目的。Only use self-signed certificates for dev/test purposes. 有关详细信息,请参阅 在 Azure Kubernetes Service 上创建 HTTPS 入口控制器和使用自己的 TLS 证书 (AKS) For more information, see Create an HTTPS ingress controller and use your own TLS certificates on Azure Kubernetes Service (AKS).

对于生产工作负荷,请从受信任的证书颁发机构 (CA) 获取签名证书。For production workloads, get signed certificates from trusted certificate authorities (CA). 有关生成和配置 "加密证书" 信息,请参阅 在 Azure Kubernetes 服务中使用静态公共 IP 地址创建入口控制器 (AKS) For information about generating and configuring Let's Encrypt certificates, see Create an ingress controller with a static public IP address in Azure Kubernetes Service (AKS).

你可能还需要根据组织的策略来轮换你的证书。You may also need to rotate your certificates as per the organization's policies. 有关信息,请参阅 在 Azure Kubernetes Service 中轮替证书 (AKS) For information, see, Rotate certificates in Azure Kubernetes Service (AKS).

命名空间Namespaces

使用命名空间来组织群集中的服务。Use namespaces to organize services within the cluster. Kubernetes 群集中的每个对象属于某个命名空间。Every object in a Kubernetes cluster belongs to a namespace. 默认情况下,在创建新对象时,该对象将划归到 default 命名空间。By default, when you create a new object, it goes into the default namespace. 但是,良好的做法是创建更具描述性的命名空间,以帮助组织群集中的资源。But it's a good practice to create namespaces that are more descriptive to help organize the resources in the cluster.

首先,命名空间有助于防止命名冲突。First, namespaces help prevent naming collisions. 如果多个团队将微服务(也许有数百个)部署到同一群集,而这些微服务都属于同一命名空间,则管理就会变得艰难。When multiple teams deploy microservices into the same cluster, with possibly hundreds of microservices, it gets hard to manage if they all go into the same namespace. 此外,命名空间还允许:In addition, namespaces allow you to:

  • 将资源约束应用到命名空间,以避免分配到该命名空间的 pod 集总数超过该命名空间的资源配额。Apply resource constraints to a namespace, so that the total set of pods assigned to that namespace cannot exceed the resource quota of the namespace.

  • 在命名空间级别应用策略,包括 RBAC 和安全策略。Apply policies at the namespace level, including RBAC and security policies.

对于微服务体系结构,考虑将微服务组织成边界上下文,并为每个边界上下文创建命名空间。For a microservices architecture, considering organizing the microservices into bounded contexts, and creating namespaces for each bounded context. 例如,与“订单履行”边界上下文相关的所有微服务可以划归到同一命名空间。For example, all microservices related to the "Order Fulfillment" bounded context could go into the same namespace. 或者,为每个开发团队创建一个命名空间。Alternatively, create a namespace for each development team.

将公用服务放入其自身的独立命名空间。Place utility services into their own separate namespace. 例如,可以部署 Elasticsearch 或 Prometheus 进行群集监视,或者为 Helm 部署 Tiller。For example, you might deploy Elasticsearch or Prometheus for cluster monitoring, or Tiller for Helm.

运行状况探测Health probes

Kubernetes 定义 pod 可以公开的两种类型的运行状况探测:Kubernetes defines two types of health probe that a pod can expose:

  • 就绪探测器:告知 Kubernetes pod 是否已准备好接受请求。Readiness probe: Tells Kubernetes whether the pod is ready to accept requests.

  • 活动探测:指示 Kubernetes 是否应删除 pod 并启动新实例。Liveness probe: Tells Kubernetes whether a pod should be removed and a new instance started.

考虑探测的设置时,建议回顾 Kubernetes 中的服务工作原理。When thinking about probes, it's useful to recall how a service works in Kubernetes. 服务提供与 pod 集(零个或多个)匹配的标签选择器。A service has a label selector that matches a set of (zero or more) pods. Kubernetes 对发往匹配该选择器的 pod 的流量进行负载均衡。Kubernetes load balances traffic to the pods that match the selector. 只有成功启动且正常的 pod 才能收到流量。Only pods that started successfully and are healthy receive traffic. 如果某个容器崩溃,Kubernetes 会终止 pod,并计划替代的 pod。If a container crashes, Kubernetes kills the pod and schedules a replacement.

有时,尽管某个 pod 已成功启动,但不一定已准备好接收流量。Sometimes, a pod may not be ready to receive traffic, even though the pod started successfully. 例如,在执行初始化任务期间,容器中运行的应用程序会将内容载入内存或读取配置数据。For example, there may be initialization tasks, where the application running in the container loads things into memory or reads configuration data. 若要指示某个 pod 正常但尚未准备好接收流量,请定义就绪情况探测。To indicate that a pod is healthy but not ready to receive traffic, define a readiness probe.

运行情况探测可以处理 pod 仍在运行但不正常,应予以回收的情况。Liveness probes handle the case where a pod is still running, but is unhealthy and should be recycled. 例如,假设某个容器正在为 HTTP 请求提供服务,但出于某种原因而挂起。For example, suppose that a container is serving HTTP requests but hangs for some reason. 该容器未崩溃,但已停止为任何请求提供服务。The container doesn't crash, but it has stopped serving any requests. 如果定义了 HTTP 运行情况探测,则探测将停止响应,并告知 Kubernetes 重启 pod。If you define an HTTP liveness probe, the probe will stop responding and that informs Kubernetes to restart the pod.

设计探测时请注意以下事项:Here are some considerations when designing probes:

  • 如果代码的启动时间较长,则可能存在以下风险:运行情况探测在启动完成之前报告故障。If your code has a long startup time, there is a danger that a liveness probe will report failure before the startup completes. 为防止这种情况,请使用 initialDelaySeconds 设置来延迟探测的启动。To prevent this, use the initialDelaySeconds setting, which delays the probe from starting.

  • 除非重启 pod 有可能会将其还原到正常状态,否则运行情况探测没有作用。A liveness probe doesn't help unless restarting the pod is likely to restore it to a healthy state. 可以使用运行情况探测来防范内存泄漏或意外死锁,但是,没有必要重启立即会再发生故障的 pod。You can use a liveness probe to mitigate against memory leaks or unexpected deadlocks, but there's no point in restarting a pod that's going to immediately fail again.

  • 有时,就绪情况探测可用于检查依赖服务。Sometimes readiness probes are used to check dependent services. 例如,如果一个 pod 依赖于某个数据库,则探测可能会检查数据库连接。For example, if a pod has a dependency on a database, the probe might check the database connection. 但是,此方法可能造成意外的问题。However, this approach can create unexpected problems. 外部服务可能出于某种原因而暂时不可用。An external service might be temporarily unavailable for some reason. 这会导致就绪情况探测无法针对服务中的所有 pod 运行,从而导致从负载均衡中删除所有这些 pod,进而又导致上游发生连锁故障。That will cause the readiness probe to fail for all the pods in your service, causing all of them to be removed from load balancing, and thus creating cascading failures upstream. 更好的方法是在服务中实施重试处理,使服务能够从暂时性故障中正常恢复。A better approach is to implement retry handling within your service, so that your service can recover correctly from transient failures.

资源约束Resource constraints

资源争用可能影响服务的可用性。Resource contention can affect the availability of a service. 为容器定义资源约束,以避免单个容器占用过多的群集资源(内存和 CPU)。Define resource constraints for containers, so that a single container cannot overwhelm the cluster resources (memory and CPU). 对于非容器资源(例如线程或网络连接),请考虑使用隔舱模式来隔离资源。For non-container resources, such as threads or network connections, consider using the Bulkhead Pattern to isolate resources.

使用资源配额限制允许命名空间使用的资源总量。Use resource quotas to limit the total resources allowed for a namespace. 这样可以避免前端耗尽后端服务的资源,反之亦然。That way, the front end can't starve the backend services for resources or vice-versa.

基于角色的访问控制 (RBAC)Role-based access control (RBAC)

Kubernetes 和 Azure 都提供基于角色的访问控制 (RBAC) 机制:Kubernetes and Azure both have mechanisms for role-based access control (RBAC):

  • Azure RBAC 控制对 Azure 中的资源的访问,还可以创建新的 Azure 资源。Azure RBAC controls access to resources in Azure, including the ability to create new Azure resources. 可将权限分配给用户、组或服务主体。Permissions can be assigned to users, groups, or service principals. (服务主体是应用程序使用的安全标识。)(A service principal is a security identity used by applications.)

  • Kubernetes RBAC 控制 Kubernetes API 的权限。Kubernetes RBAC controls permissions to the Kubernetes API. 例如,创建 pod 和列表 pod 是可以通过 Kubernetes RBAC 向用户授权 (或拒绝) 的操作。For example, creating pods and listing pods are actions that can be authorized (or denied) to a user through Kubernetes RBAC. 若要将 Kubernetes 权限分配给用户,请创建 角色角色绑定To assign Kubernetes permissions to users, you create roles and role bindings:

    • 角色是在命名空间内部应用的一组权限。A Role is a set of permissions that apply within a namespace. 权限定义为针对资源(pod、部署等)应用的谓词(获取、更新、创建、删除)。Permissions are defined as verbs (get, update, create, delete) on resources (pods, deployments, etc.).

    • 角色绑定将用户或组分配到角色。A RoleBinding assigns users or groups to a Role.

    • 此外还有一个群集角色对象,该对象类似于角色,但会应用到整个群集中的所有命名空间。There is also a ClusterRole object, which is like a Role but applies to the entire cluster, across all namespaces. 若要将用户或组分配到群集角色,请创建群集角色绑定。To assign users or groups to a ClusterRole, create a ClusterRoleBinding.

AKS 集成了这两种 RBAC 机制。AKS integrates these two RBAC mechanisms. 创建 AKS 群集时,可将其配置为使用 Azure AD 进行用户身份验证。When you create an AKS cluster, you can configure it to use Azure AD for user authentication. 有关此设置的详细信息,请参阅将 Azure Active Directory 与 Azure Kubernetes 服务集成For details on how to set this up, see Integrate Azure Active Directory with Azure Kubernetes Service.

完成此配置后,想要访问 Kubernetes API(例如,通过 kubectl)的用户必须使用其 Azure AD 凭据登录。Once this is configured, a user who wants to access the Kubernetes API (for example, through kubectl) must sign in using their Azure AD credentials.

默认情况下,Azure AD 用户无权访问群集。By default, an Azure AD user has no access to the cluster. 若要授予访问权限,群集管理员需创建引用 Azure AD 用户或组的角色绑定。To grant access, the cluster administrator creates RoleBindings that refer to Azure AD users or groups. 如果用户对特定的操作没有权限,则该操作将会失败。If a user doesn't have permissions for a particular operation, it will fail.

如果用户默认没有访问权限,群集管理员最初又怎么有权创建角色绑定呢?If users have no access by default, how does the cluster admin have permission to create the role bindings in the first place? AKS 群集实际上有两种类型的凭据用于调用 Kubernetes API 服务器:群集用户和群集管理员。群集管理员凭据向群集授予完全访问权限。An AKS cluster actually has two types of credentials for calling the Kubernetes API server: cluster user and cluster admin. The cluster admin credentials grant full access to the cluster. Azure CLI 命令 az aks get-credentials --admin 下载群集管理员凭据,并将其保存到 kubeconfig 文件中。The Azure CLI command az aks get-credentials --admin downloads the cluster admin credentials and saves them into your kubeconfig file. 群集管理员可以使用此 kubeconfig 来创建角色和角色绑定。The cluster administrator can use this kubeconfig to create roles and role bindings.

由于群集管理员凭据的权限如此强大,因此需要使用 Azure RBAC 来限制其访问权限:Because the cluster admin credentials are so powerful, use Azure RBAC to restrict access to them:

  • “Azure Kubernetes 服务群集管理员角色”有权下载群集管理员凭据。The "Azure Kubernetes Service Cluster Admin Role" has permission to download the cluster admin credentials. 应该只将群集管理员分配到此角色。Only cluster administrators should be assigned to this role.

  • “Azure Kubernetes 服务群集用户角色”有权下载群集用户凭据。The "Azure Kubernetes Service Cluster User Role" has permission to download the cluster user credentials. 可将非管理员用户分配到此角色。Non-admin users can be assigned to this role. 此角色不会授予对群集中 Kubernetes 资源的任何特定权限 — 它只允许用户连接到 API 服务器。This role does not give any particular permissions on Kubernetes resources inside the cluster — it just allows a user to connect to the API server.

定义 RBAC 策略(Kubernetes 和 Azure)时,请考虑组织中的角色:When you define your RBAC policies (both Kubernetes and Azure), think about the roles in your organization:

  • 谁可以创建或删除 AKS 群集和下载管理员凭据?Who can create or delete an AKS cluster and download the admin credentials?
  • 谁可以管理群集?Who can administer a cluster?
  • 谁可以创建或更新命名空间中的资源?Who can create or update resources within a namespace?

良好的做法是使用角色和角色绑定(而不是群集角色和群集角色绑定)按命名空间限定 Kubernetes RBAC 权限的范围。It's a good practice to scope Kubernetes RBAC permissions by namespace, using Roles and RoleBindings, rather than ClusterRoles and ClusterRoleBindings.

最后还有这样一个问题:AKS 群集需要拥有哪些权限才能创建和管理负载均衡器、网络或存储等 Azure 资源。Finally, there is the question of what permissions the AKS cluster has to create and manage Azure resources, such as load balancers, networking, or storage. 若要使用 Azure API 对自身进行身份验证,群集可以使用 Azure AD 服务主体。To authenticate itself with Azure APIs, the cluster uses an Azure AD service principal. 如果创建群集时未指定服务主体,则系统会自动创建一个服务主体。If you don't specify a service principal when you create the cluster, one is created automatically. 但是,良好的安全做法是先创建服务主体,然后为其分配最少量的 RBAC 权限。However, it's a good security practice to create the service principal first and assign the minimal RBAC permissions to it. 有关详细信息,请参阅 Azure Kubernetes 服务中的服务主体For more information, see Service principals with Azure Kubernetes Service.

机密管理和应用程序凭据Secrets management and application credentials

应用程序和服务通常需要使用凭据连接到 Azure 存储或 SQL 数据库等外部服务。Applications and services often need credentials that allow them to connect to external services such as Azure Storage or SQL Database. 此处的难题在于如何保护这些凭据的安全,避免将其透露。The challenge is to keep these credentials safe and not leak them.

对于 Azure 资源,一种做法是使用托管标识。For Azure resources, one option is to use managed identities. 托管标识的概念是指,应用程序或服务在 Azure AD 中存储一个标识,并使用此标识在 Azure 服务中进行身份验证。The idea of a managed identity is that an application or service has an identity stored in Azure AD, and uses this identity to authenticate with an Azure service. 在 Azure AD 中为应用程序或服务创建一个服务主体,应用程序或服务使用 OAuth 2.0 令牌进行身份验证。The application or service has a Service Principal created for it in Azure AD, and authenticates using OAuth 2.0 tokens. 正在执行的进程调用 localhost 地址来获取令牌。The executing process calls a localhost address to get the token. 这样,就不需要存储任何密码或连接字符串。That way, you don't need to store any passwords or connection strings. 若要在 AKS 中使用托管标识,可以使用 aad-pod-identity 项目将标识分配到单个 pod。You can use managed identities in AKS by assigning identities to individual pods, using the aad-pod-identity project.

目前,并非所有 Azure 服务都支持使用托管标识进行身份验证。Currently, not all Azure services support authentication using managed identities. 有关列表,请参阅支持 Azure AD 身份验证的 Azure 服务For a list, see Azure services that support Azure AD authentication.

即使使用托管标识,也可能需要存储某些凭据或其他应用程序机密,不管是对于不支持托管标识的 Azure 服务、第三方服务、API 密钥,还是其他服务。Even with managed identities, you'll probably need to store some credentials or other application secrets, whether for Azure services that don't support managed identities, third-party services, API keys, and so on. 下面是可安全存储机密的某些选项:Here are some options for storing secrets securely:

  • Azure Key Vault。Azure Key Vault. 在 AKS 中,可将 Key Vault 中的一个或多个机密装载为一个卷。In AKS, you can mount one or more secrets from Key Vault as a volume. 该卷从 Key Vault 读取机密。The volume reads the secrets from Key Vault. 然后,pod 可以像读取普通卷一样读取机密。The pod can then read the secrets just like a regular volume. 有关详细信息,请参阅 GitHub 上的 密码存储-csi-驱动程序-azure 项目。For more information, see the secrets-store-csi-driver-provider-azure project on GitHub.

    pod 使用 pod 标识(如上所述)或者结合使用 Azure AD 服务主体和客户端机密对自身进行身份验证。The pod authenticates itself by using either a pod identity (described above) or by using an Azure AD Service Principal along with a client secret. 建议使用 pod 标识,因为这样就不需要客户端机密。Using pod identities is recommended because the client secret isn't needed in that case.

  • HashiCorp Vault。HashiCorp Vault. Kubernetes 应用程序可以使用 Azure AD 托管标识在 HashiCorp Vault 中进行身份验证。Kubernetes applications can authenticate with HashiCorp Vault using Azure AD managed identities. 请参阅 HashiCorp Vault 为 Azure Active Directory 代言See HashiCorp Vault speaks Azure Active Directory. 可以将保管库本身部署到 Kubernetes,请考虑在独立于应用程序群集的专用群集中运行保管库。You can deploy Vault itself to Kubernetes, consider running it in a separate dedicated cluster from your application cluster.

  • Kubernetes 机密。Kubernetes secrets. 另一个选项是直接使用 Kubernetes 机密。Another option is simply to use Kubernetes secrets. 此选项最容易配置,但存在一些难题。This option is the easiest to configure but has some challenges. 机密存储在分布式密钥-值存储 etcd 中。Secrets are stored in etcd, which is a distributed key-value store. AKS 静态加密 etcdAKS encrypts etcd at rest. Microsoft 管理加密密钥。Microsoft manages the encryption keys.

使用 HashiCorp Vault 或 Azure Key Vault 等系统可以获得多种优势,例如:Using a system like HashiCorp Vault or Azure Key Vault provides several advantages, such as:

  • 对机密进行集中控制。Centralized control of secrets.
  • 确保所有机密静态加密。Ensuring that all secrets are encrypted at rest.
  • 集中式密钥管理。Centralized key management.
  • 对机密进行访问控制。Access control of secrets.
  • 审核Auditing

容器和 Orchestrator 安全性Container and Orchestrator security

建议采用以下做法来保护 Pod 和容器:These are recommended practices for securing your pods and containers:

  • 威胁监视 - 使用容器注册表Azure Defender 监视威胁,Azure Defender for Kubernetes (第三方) 。 Threat Monitoring – Monitor for threats using Azure Defender for container registries and Azure Defender for Kubernetes (or 3rd party capabilities). 如果要在 VM 上托管容器,请使用Azure Defender 第三方功能。If you are hosting containers on a VM, use Azure Defender for servers or a 3rd party capability. 此外,还可以将容器监视 解决方案中的 日志 Azure Monitor到Azure Sentinel 或现有 SIEMAdditionally, you can integrate logs from Container Monitoring solution in Azure Monitor to Azure Sentinel or an existing SIEM

  • 漏洞监视 - 使用 Azure 安全中心 或第三方解决方案持续监视 映像和正在运行的容器,了解已知Azure 市场。Vulnerability monitoring - Continuously monitor images and running containers for known vulnerabilities using Azure Security Center or a 3rd party solution available through the Azure Marketplace.

  • 使用 Azure 任务自动修补映像,这是 Azure 容器注册表。Automate image patching using ACR Tasks, a feature of Azure Container Registry. 容器映像是在层中生成的。A container image is built up from layers. 基本层包括 OS 映像和应用程序框架映像,例如 ASP.NET Core 或 Node.js。The base layers include the OS image and application framework images, such as ASP.NET Core or Node.js. 基本映像通常是由应用程序开发人员在上游创建的,由其他项目维护人员维护。The base images are typically created upstream from the application developers, and are maintained by other project maintainers. 在上游修补这些映像时,'更新、测试和重新部署自己的映像,以便'任何已知的安全漏洞。When these images are patched upstream, it's important to update, test, and redeploy your own images, so that you don't leave any known security vulnerabilities. ACR 任务可以帮助将此过程自动化。ACR Tasks can help to automate this process.

  • 将映像存储在受信任的专用注册表中 ,Azure 容器注册表或 Docker 受信任注册表。Store images in a trusted private registry such as Azure Container Registry or Docker Trusted Registry. 在 Kubernetes 中使用验证许可 Webhook,以确保 pod 只能从受信任的注册表提取映像。Use a validating admission webhook in Kubernetes to ensure that pods can only pull images from the trusted registry.

  • 应用最小特权 原则Apply Least Privilege principle

    • 不要以特权模式运行容器。Don't run containers in privileged mode. 特权模式可让容器访问主机上的所有设备。Privileged mode gives a container access to all devices on the host.
    • 如果可能,请避免以 root 身份在容器中运行进程。When possible, avoid running processes as root inside containers. 从安全角度来看,容器不提供完全隔离,因此'特权用户运行容器进程更好。Containers do not provide complete isolation from a security standpoint, so it's better to run a container process as a non-privileged user.

DevOps 注意事项DevOps considerations

此参考体系结构提供了 [Azure 资源管理器模板][arm-template] ,用于预配云资源及其依赖项。This reference architecture provides an [Azure Resource Manager template][arm-template] for provisioning the cloud resources, and its dependencies. 使用 [Azure 资源管理器 模板][arm-template] ,可以使用 [Azure DevOps Services][az-devops] 在数分钟内预配不同的环境,例如复制生产方案。With the use of [Azure Resource Manager templates][arm-template] you can use [Azure DevOps Services][az-devops] to provision different environments in minutes, for example to replicate production scenarios. 这样,你只需根据需要节省成本并预配负载测试环境。This allows you save cost and provision load testing environment only when needed.

请考虑遵循工作负荷隔离条件来构建 ARM 模板,工作负荷通常定义为任意功能单元;例如,你可以为群集提供单独的模板,然后将其他模板用于从属服务。Consider following the workload isolation criteria to structure your ARM template, a workload is typically defined as an arbitrary unit of functionality; you could, for example, have a separate template for the cluster and then other for the dependant services. 工作负荷隔离使 DevOps 能够 (CI/CD) 执行持续集成和持续交付,因为每个工作负荷都由其相应的 DevOps 团队关联和管理。Workload isolation enables DevOps to perform continuous integration and continuous delivery (CI/CD), since every workload is associated and managed by its corresponding DevOps team.

部署 (CI/CD) 注意事项Deployment (CI/CD) considerations

下面是微服务体系结构的可靠 CI/CD 过程的一些目标:Here are some goals of a robust CI/CD process for a microservices architecture:

  • 每个团队可以独立生成并部署自有的服务,而不影响或干扰其他团队。Each team can build and deploy the services that it owns independently, without affecting or disrupting other teams.
  • 新服务版本在部署到生产环境之前,会先部署到开发/测试/QA 环境进行验证。Before a new version of a service is deployed to production, it gets deployed to dev/test/QA environments for validation. 在每个阶段强制实施质量控制。Quality gates are enforced at each stage.
  • 可以与以前的版本并行部署服务的新版本。A new version of a service can be deployed side by side with the previous version.
  • 实施足够的访问控制策略。Sufficient access control policies are in place.
  • 对于容器化工作负荷,你可以信任部署到生产环境中的容器映像。For containerized workloads, you can trust the container images that are deployed to production.

若要详细了解这些问题,请参阅 CI/CD 微服务体系结构To learn more about the challenges, see CI/CD for microservices architectures.

有关具体建议和最佳实践,请参阅 微服务 On Kubernetes 的 CI/CDFor specific recommendations and best practices, see CI/CD for microservices on Kubernetes.

成本注意事项Cost considerations

使用 Azure 定价计算器估算成本。Use the Azure pricing calculator to estimate costs. 其他注意事项,请参阅 Microsoft Azure Well-Architected 框架的 "成本" 部分。Other considerations are described in the Cost section in Microsoft Azure Well-Architected Framework.

对于此体系结构中使用的某些服务,以下是一些需要考虑的事项。Here are some points to consider for some of the services used in this architecture.

Azure Kubernetes 服务 (AKS)Azure Kubernetes Service (AKS)

在 Kubernetes 群集的部署、管理和操作中,AKS 没有相关的费用。There are no costs associated for AKS in deployment, management, and operations of the Kubernetes cluster. 只需为 Kubernetes 群集使用的虚拟机实例、存储和网络资源付费。You only pay for the virtual machines instances, storage, and networking resources consumed by your Kubernetes cluster.

若要估计所需资源的成本,请参阅 容器服务计算器To estimate the cost of the required resources please see the Container Services calculator.

Azure 负载均衡器Azure Load balancer

仅对已配置的负载均衡和出站规则的数量收费。You are charged only for the number of configured load-balancing and outbound rules. 入站 NAT 规则是免费的。Inbound NAT rules are free. 如果未配置任何规则,则不会对标准负载均衡器进行小时收费。There is no hourly charge for the Standard Load Balancer when no rules are configured.

有关详细信息,请参阅 Azure 负载均衡器定价See Azure Load Balancer Pricing for more information.

Azure PipelinesAzure Pipelines

此参考体系结构仅使用 Azure Pipelines。This reference architecture only uses Azure Pipelines. Azure 以单个服务的形式提供 Azure 管道。Azure offers the Azure Pipeline as an individual Service. 如果 CI/CD 和1个自托管作业每月无限制的情况下出现免费的 Microsoft 托管作业(每月1800分钟),则需要支付额外的费用。You are allowed a free Microsoft-hosted job with 1,800 minutes per month for CI/CD and 1 self-hosted job with unlimited minutes per month, extra jobs have charges. 有关详细信息, 请参阅 Azure DevOps Services 定价For more information, see Azure DevOps Services Pricing.

Azure MonitorAzure Monitor

对于 Azure Monitor Log Analytics,需要为数据引入和保留付费。For Azure Monitor Log Analytics, you are charged for data ingestion and retention. 有关详细信息,请参阅 Azure Monitor定价For more information, see Azure Monitor Pricing for more information.

部署解决方案Deploy the solution

若要部署此体系结构的参考实现,请按照 GitHub 存储库中 的步骤操作To deploy the reference implementation for this architecture, follow the steps in the GitHub repo.

后续步骤Next steps