question

WangJane-5356 avatar image
0 Votes"
WangJane-5356 asked TravisCragg-MSFT commented

chaos engineering in Azure

I have the following hypothesis to test in Azure. This is to build up the team's confidence in the applications as well as the Azure environments. The ultimate goal is to shift left and continuously chaos testing with each release.
1. HA - inject failure to one component in Azure, and see the recovery. This can be PaaS or IaaS level components.
2. DR - inject local/zone level failure in an azure subscription, and test out the biz continuity plan.
3. service dependencies - inject pod failure, or inject latency to services, to test the system resiliency.

What are the options to inject Azure SaaS/PaaS/IaaS level failures and latencies? What to use to bring the Azure chaos engineering into CICD?



azure-virtual-machinesazure-webapps
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

TravisCragg-MSFT avatar image
0 Votes"
TravisCragg-MSFT answered TravisCragg-MSFT commented

@WangJane-5356 Azure does not have any integrated ways to support chaos engineering, but you can easily write your own or perform it yourself.

You can stop a VM or WebApp at any time using the Azure Portal. This can also be automated in many different ways.

for a regional component, it will be easiest to simply apply a firewall to stop incoming requests to a region and check to see if it recovers properly.

Service dependencies can get a bit harder. You will most likely need to manually strain a VM or service to inflict the additional latency. If this is a autoscaling service (VMSS or Kubernetes), try scaling the service to well below the needed amount to see how it handles a delay or intermittent failures. You will likely need to be creative for this one.

If you need more information on better ways to do any of this, please let me know.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

thank you @ TravisCragg-MSFTI would like to test out the Azure IaaS and PaaS layer including local and zone level network. This is to build up confidence with the platform and Azure environments. Examples can be
1. HA testing of local and zone availability set
2. local, zone and global network level failover
3. latency, monitor and log insights of the failover.
Is that possible with Azure? Does Azure do something similar at the platform management level?





0 Votes 0 ·

For #1 and #2, Azure does not have any native tests, but you can easily simulate it using the methods I described above.

Latency, monitoring insights will use Azure Monitor. Every service has its own pre-built monitoring, and you can also add some custom metrics to the monitoring.

I suggest you look into the capabilities of the individual service to see what is possible.


0 Votes 0 ·