Testing in Production – It’s a living ecosystem, not a patient on a gurney

James A. Whittaker’s (JW) recent blog post “Testing in the Data Center (Manufacturing No More),” is a fun little ditty and it inspired me. Thank you James for sharing the testing is like an ER analogy.

In this post JW draws an analogy that software is moving from a manufacturing paradigm to one of a health model paradigm. Though I’m going to pick at a few pieces where the analogy falls short I have to admit up front it’s a great juxtaposition. All of his talk about the cancer of security, products that die of natural causes in the datacenter and OH the clip board of data are truly awesome. Really, I recommend reading the his blog post here and then continue to Bing for better results and a cooler UI with neat daily pictures. Sorry, but I tend to be a bit partisan.

UG09TheWorstCaseScenarioSurvivalCardGI’m a big proponent of blurring the line between pre-release testing and testing in production (TiP). I’m not a fan of Big up Front Test (BuFT) that my friend Seth Eliot posted on a few weeks ago. I do like what JW did with the whole ER and tesitng thing but it didn’t quite hold together for me.  I think I need to find my own crazy metaphor for services testing.  Perhaps, testing in the wild Amazon basin. Nope can’t use that, another competitor.  Perhaps I should write a post on how to survive the worst case scenario while testing in the Wild Australian Outback. Nope, I’m a bit more pedantic with a bit less flare than JW.  I do however like this natural environment and survival angle.

My current focus within Microsoft services testing is all around improving release velocity while mitigating risk. Yes, a bit pedantic compared to the ER analogy but go with me for a bit here.

BTW, did you know there is a bit of a revolving door between Microsoft and GOOG? I have quite a few former colleagues around various companies. Some years back, I remember talking to Pat Copeland, who is currently in a very senior management position within GOOG, about the role of test as the driver of efficiency and arbiter of risk mitigation as opposed to the quality control expert. I argued that test had the data and so we had an obligation to play in the efficiency space and inform management of risk. As I remember the discussion, Pat didn’t agree with me that test was responsible for efficiency but he didn’t fully dismiss the risk piece.

I really like JW’s analogy to a hospital wing with patience in beds and whole wards of the hospital dedicated to really big products. It is all very visual.  Where I find the analogy falls short is that JW seems to treats a service as a single patient in the hospital dealing with their set of symptoms. The challenge we have in the services world in interconnectedness. There is no such thing as a standalone service anymore. Put simply, “No man is an island.”

No man is an island, entire of itself; every man is a piece of the continent, a part of the main; if a clod be washed away by the sea, Europe is the less, as well as if a promontory were... Any man's death diminishes me, because I am involved in mankind; and therefore never seek to know for whom the bell tolls; it tolls for thee." John Donne, Mediation #17, 1623 .

Every online service is becoming interconnected. Even applications such as those on various app-stores or even Microsoft Office (I’m close to shipping Office 14) on the PC are becoming reliant upon services for the complete user experience. Neither Hotmail nor Gmail can function successfully and cost effectively without a host of other services such as federated authentication, advertising, and personalization as integrated pieces of the total experience.butterfly_effect

No service is an island, or a single patient lying on a gurney. They are interconnected. As certainly as the beating wings of a butterfly (see butterfly effect) can cause a tornado halfway around the world, each and every service can have an impact on the next.

So, if not the singular patient metaphor, then what? Let’s go environmental, shall we?

I think perhaps we should apply an environmental analogy and given my encounter with Pat Copeland all those years ago I might change risk mitigation to environmental protectionism.

We often have clusters of services that are more interconnected than others.  We could consider these clusters to be like little ecosystems. the are heavily dependent upon each other but they are not isolated from the impact of other services.  I love to point out how Mashups build upon services in ways the developers never intended.  Work by one group is consumed and distributed by another entity.  No, this is not quite the circle of life but I am talking about layers and interconnectedness.

We know for certain that even the tide pools that surface at low tide do not function in complete isolation. Each tide cycle some fish come and go, sand washes in and out, and rather importantly the oxygen within the tide pool is replenished.  The tide pool ecosystem therefore cannot exist without some level of interaction with other ecosystems.  The beach grasslands may house predators and the coral reefs just a bit further out to sea may shelter fish when they are not in the pools.  No service is an island unto itself.

Each summer my family and I go to throcke Oregon Coast and stay near Haystack Rock in Cannon Beach. Each year my wife, my kids and I visit the tide pools at Haystack Rock. Every year we stop off and visit the folks in the pickup truck that sets up the telescope and brings the plastic bins with some of the sea life in them so we can see them up close and touch a few. There is always a jar out for donations to preserve the Haystack Rock natural preserve and every year we take what cash we have and let the kids make a donation.

SeaAnemoneIt is an awesome vacation, but I will tell you that every year I go there, I do notice changes. This year the amount of seaweed was larger than any previous year i could remember.  The tide pools, though a strong and vibrant ecosystem, are ever changing and evolving. More sand comes in and a pool completely fills, where there used to be a dozen starfish now there is a solid wall of them hanging onto the underside of a large rock, and sometimes where we had seen a floor of sea anemone we now see an barren hole filled with salt water.

Software as a Service is alive and ever changing. It requires an amount of protection and stewardship or it will decay and fail. It requires vigilance or one element of the system such as sea lions may find an easy meal by hunting wild salmon trapped up against the door of lock.  the sea lions will let their ravenous gluttony get carried away to the point that there are not enough salmon left in the run to sustain them.  Similarly one service may consume too many resources of another service and cause it irreparable harm.

In my world we still have clip boards filled with data but they are ones that have the our record counts, our observations from the field, and our assessments of risk both for my tide pools at Haystack Rock as well as the risk to our service within itself and potential risk due to encroachment or failures of other interconnected systems.

Yes, I’m confident in test we are still have a major role to play in the risk assessment business, just that in the world of services, with fast paced changes, we also need to show up every day and evaluate the real world, the data center and the software within, for its health.  We need to run our tests against production, check our PH balances, count our fish, and assess our production telemetry.  At the end of the day we file our report and get ready to do it again tomorrow because the ecosystem never stops changing. 

Thanks JW.