Resiliency is the ability of a system to gracefully handle and recover from failures, both inadvertent and malicious.
The nature of cloud hosting, where applications are often multi-tenant, use shared platform services, compete for resources and bandwidth, communicate over the Internet, and run on commodity hardware means there is an increased likelihood that both transient and more permanent faults will arise. The connected nature of the internet and the rise in sophistication and volume of attacks increase the likelihood of a security disruption.
Detecting failures and recovering quickly and efficiently, is necessary to maintain resiliency.
|Bulkhead||Isolate elements of an application into pools so that if one fails, the others will continue to function.|
|Circuit Breaker||Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource.|
|Compensating Transaction||Undo the work performed by a series of steps, which together define an eventually consistent operation.|
|Health Endpoint Monitoring||Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals.|
|Leader Election||Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.|
|Queue-Based Load Leveling||Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads.|
|Retry||Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed.|
|Scheduler Agent Supervisor||Coordinate a set of actions across a distributed set of services and other remote resources.|
Achieving security resilience requires a combination of preventive measures to block attacks, responsive measures detect and quickly remediate active attacks, and governance to ensure consistent application of best practices.
- Security strategy should include lessons learned described in security strategy guidance
- Azure security configurations should align to the best practices and controls in the Azure Security Benchmark (ASB). Security configurations for Azure services should align to the Security baselines for Azure in the ASB
- Azure architectures should integrate native security capabilities to protect and monitor workloads including Azure Defender, Azure DDoS protection, Azure Firewall, and Azure Web Application Firewall (WAF)
For a more detailed dsicussion, see the Cybersecurity Resilience module in the CISO workshop