您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

复原模式Resiliency patterns

复原能力是指系统能够正常处理和恢复意外和恶意故障的能力。Resiliency is the ability of a system to gracefully handle and recover from failures, both inadvertent and malicious.

由于云托管的性质(应用程序通常是多租户的、使用共享平台服务、争用资源和带宽、通过 Internet 通信、在市售硬件上运行),出现暂时性故障和持久性故障的可能性增大。The nature of cloud hosting, where applications are often multi-tenant, use shared platform services, compete for resources and bandwidth, communicate over the Internet, and run on commodity hardware means there is an increased likelihood that both transient and more permanent faults will arise. Internet 的连接性质和复杂程度和攻击量会增加安全中断的可能性。The connected nature of the internet and the rise in sophistication and volume of attacks increase the likelihood of a security disruption.

若要保持复原能力,需要更快、更高效地检测和恢复故障。Detecting failures and recovering quickly and efficiently, is necessary to maintain resiliency.

模式Pattern 总结Summary
隔层Bulkhead 将应用程序的元素隔离到池中,这样,如果一个元素发生故障,其他元素可继续工作。Isolate elements of an application into pools so that if one fails, the others will continue to function.
断路器Circuit Breaker 连接到远程服务或资源时处理故障,此类故障所需修复时间不定。Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource.
补偿事务Compensating Transaction 撤销一系列会共同定义最终一致操作的工作。Undo the work performed by a series of steps, which together define an eventually consistent operation.
运行状况终结点监视Health Endpoint Monitoring 在应用程序中实施可让外部工具通过公开终结点定期访问的功能检查。Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals.
领导选择Leader Election 通过选拔一个实例作为领导来负责管理其他实例,协调分布式应用程序中协作性任务实例集合所执行的操作。Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.
基于队列的负载调控Queue-Based Load Leveling 使用队列在任务与所调用的服务之间充当缓冲,从而缓解间歇性负载过大现象。Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads.
重试Retry 当应用程序尝试连接到服务或网络资源时,使应用程序能够通过以透明方式重试先前失败的操作来处理预期的临时故障。Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed.
计划程序代理监督程序Scheduler Agent Supervisor 跨一组分布式服务和其他远程资源协调一组操作。Coordinate a set of actions across a distributed set of services and other remote resources.

安全 ResliencySecurity Resliency

实现安全复原需要结合使用预防措施来阻止攻击、响应性的措施检测并快速修正活动攻击,以及确保一致地应用最佳做法。Achieving security resilience requires a combination of preventive measures to block attacks, responsive measures detect and quickly remediate active attacks, and governance to ensure consistent application of best practices.

有关更详细的 dsicussion,请参阅安全官研讨会中的 网络安全复原 模块For a more detailed dsicussion, see the Cybersecurity Resilience module in the CISO workshop