TestOps: it really happen here
It’s more of an evolution of what I have already been talking about around testing services and Testing in Production. I call it TestOps.
Testers need to move out of the mindset of writing tests – running tests – evaluating results. Instead of using test results from a daily run as your quality signal, use the big data pipe coming off of your product (generally this refers to services) as your quality signal. This includes system data like CPU, API requests, system response time as well as (properly anonymized) user data. It also includes data emitted from synthetic transactions that you can run constantly in production. These are indeed test cases, but instead of getting just a daily red/green, you get constant availability and performance. This is the technology, but it also necessarily will change our software engineering organizations. The questions of roles and specialization versus generalization will need to be answered to meet each organization’s needs. And the emergence of the Data Scientist as part of the engineering team is an exciting outcome of the TestOps approach.
In recent years, the concept of “DevOps” is very popular outside and inside Microsoft. In our team, our manager frequently promote “DevOps” into our team, so that our engineers can work more closely with livesite environment. In term of our testers, I will give you a couple of examples to how we change here.
For me, during the last year, I still focus on delivery features. However, the way I test or verify feature is different with traditional testing. I mainly focus on define, collect and measure the metric. For example, during our upgrade orchestration improvement, our goal is that reduce the outage during upgrade. So I measure the average, 95% percentile, 99% percentile, 99.9% percentile and max outage during upgrade. In addition, we developed synthetic workload running during upgrade. So that we measure the user experience in term of login failures, transaction throughput. In this context, our test automation is automatically collect metrics duration upgrade, generate excel file after test finished. We looked at the result for each run, and use the above metric to see where is the problem. In additional, detail metric drill down and troubleshoot scripts are embedded so that people can on-click to find all relevant information. With these two metrics, we can sign off our feature with high confident. We also use the same technique to monitor the upgrade experience in production. So the take away is that as a tester, using metric driven approach to lead your testing effort. In our team, we have several examples that test members start to collect metric, build KPIs, write alerts and make decision based on the data.
Recently, I initialized SQL Azure Dashboard project which monitoring our service 7X24 and fire alerts. As you see, as a tester, I really switch my role from tradition tests to TestOps. I did not own any functional tests, but own our monitoring system. I work 80% on it, and also able to jump into for Livesite issues. I provide data to support the whole teams, and implement new data collection and new alert on demand. In summary, owning monitoring and alert is one career path of our testers. As a side note, you need be very closely work with Ops and Engineering team. For my case, I work on the team who own the core of our service, so that I have enough domain knowledge, team support for what I do. If I was not in such team, I am pretty sure that my system will not have such impact.
This is another tester in our team which is totally different than me, but what I can tell his contribution and impact to the team is huge. He is a tester for another core team of our system. As a typical tester, he know everything about the system, knowing what the weak areas are, and what to testing. However, the big different is that he works on Livesite all the days voluntarily, no one asked him to do so, and no one feel that he did something he should not do . He mentored and help other team members to solve Livesite issue, he collected and analyzed trends from production, he submit CAB request to deploy fixes in production ASAP. He gained lot of experience from live site, and also used into his testing. As you can guess, his tests also found important bugs, and whenever he started to run test against our dogfood cluster, he always found ship stoppers. He gain respect from all of us. He show us another path as TestOps. Dear reader, do you like such team member in your team? Do you want to become such testers? Dear manager, do you encourage your people becomes TestOps, do your team have a good environment to grow people as a non-traditional testers? When you do review calibration, do you and how you show your people’s strength to others?
Again, a health and respectful team environment is the key for grow people.