A Problem Resolution Method
[Written on October 25, 2005]
When resolving a problem, what is the method a support engineer should use? What method is effectively used for people engaged in troubleshooting? There are many approaches possible, but it is easily agreeable that using a method that is close to the scientific method produces the best results, also allowing reproduction for future occurrences of the same problem. I have been thinking about how the scientific method is effectively used.
The Scientific Method is generally agreed to happen in 5 phases: define problem, generate hypothesis, collect data, analyze, report findings. I am certain that when resolving problems, the best support engineers follow some sort of methodology that is closely related to the scientific method. Let's discuss each of these steps in more detail.
Defining the Problem
A well-defined problem is half way solved. Or, as credited to Albert Einstein, "the formulation of a problem is often more essential than its solution." If that holds true in scientific research, it is not less essential in technical support, where a problem poorly defined my take the resolution process to either not allow finding the solution or to take unreasonable time to find it. Usually in this phase the resolution agents try to define and scope the problem by exploring it, trying to separate what is fact and what is not, and narrowing the case to a single issue to be fixed.
Given the importance of this phase, time spent looking for a sound problem definition is paid off in the next phases by having means of resolving the problem in a timely manner. The participation of the customer in this phase is essential, since it is him or her who is bringing the problem statement to the support engineer who will guide the resolution process. However, more often than wished, the problem statement is vague or describes only a symptom instead of the real problem that needs to be defined. A standard for documentation that helps to define the problem is important in this phase because it allows exploring the problem in a structured way.
There is some data gathering in this stage, but only the minimum that allows the problem definition and scoping. Before going into real data collection and spending time to collect data, it is better to generate sound hypothesis about the scoped problem that will direct the data collection; this way, the process is kept leaner and unnecessary activities are left out of the troubleshooting process.
“… A hypothesis is a statement whose truth is temporarily assumed, whose meaning is beyond all doubt… ” – Albert Einstein.
After a well-done problem definition, a knowledgeable support engineer will be able to formulate hypothesis that make sense for the problem at stake. A hypothesis may also be seen as an “educated guess”, and the meaning of the adjective “educated” is the established knowledge about the problem. Without understanding on the matter, either acquired from training or experience, the hypothesis will look more like a “wild guess” instead of an educated one.
The purpose of using a scientific method is to be able to generate hypothesis after knowing the problem that is formulated, and then look for ways to either confirm or disprove them. Sometimes it looks somewhat naive to say “prove or disprove” because it is really tough to prove a hypothesis, but we can look for information that can support a hypothesis to a specific degree of certainty that justifies investing efforts to follow the hypothesis as a likely resolution. In social sciences, a hypothesis is always either confirmed or rejected. That’s why instead of saying the hypothesis is proved, it is usually said that the hypothesis is confirmed – it can be disproved, though, by having scientific evidence that the hypothesis is false.
Good hypothesis will be a consequence of a well-formulated problem and also from knowledge about the discipline the issue is related to; deeper knowledge will lead to better hypothesis. Engagement with other subject matter experts during this phase may help to generate hypothesis that would otherwise be left aside; collaboration is important part of this method, especially because one cannot assume that all knowledge would fit inside a single brain.
There are two categories of hypothesis – propositions and empirical generalizations. Propositions have a causal order, in which one thing causes another; empirical generalizations, though, merely state that one phenomenon is related to another. For problem resolution goals, even though usually causal relationship is what allows the resolution, empirical generalizations also help to identify correlated areas where the problem’s cause may be coming from.
Knowing what and how much to collect may be challenging; it is surely another critical step to be carried out during the problem resolution. Once more, previous steps performed with excellence will make this step more efficient. If the problem was well defined and it allowed the formulation of good hypothesis, selecting what to collect will be less challenging.
Once more, knowledge about the technology will be very helpful selecting what to collect, and also what tools should be used and how to configure them to perform data collection that is at the same time complete and objective.
Too much data maybe too time-consuming, lengthening the problem resolution beyond what may be reasonable or acceptable by the customer. Too little data may not allow to confirm or disprove any hypothesis and more data may be needed, causing also more cycles of interaction for data collection, causing pain both for the support engineer and for the customer – not to mention that it will also increase resolution time.
It is important to validate the data that is collected at the source, so no time is wasted transferring files from customer to the support center and then verifying that the data was wither corrupt or wrongly collected. Valuable tools for data collection will include engines to test the validity of data both regarding data corruption and logical soundness.
If specific technical knowledge is very important in all previous phases, for analyzing the data it is worth double. Lack of knowledge on how the product operates internally, how each piece works with each other to produce the end result, and which tolls to use on the data analysis may simply block the progress. Involvement of other people with deeper knowledge on this phase will certainly happen, and in many cases in this stage the support incident may need to be escalated to a higher level or to another specialty to be analyzed.
Good use of tools is essential. Network analyzers, debugging skills, traces, and even understanding of the internals of the product will make the difference. In data analysis, it is important to look for data that relates to the hypothesis formulated and how they can be confirmed or rejected. Looking for evidence to reject or to confirm hypothesis is a hard task, as some data may confirm more than one hypothesis. The engineer must be focused both on confirming or rejecting them.
A hypothesis may be hold true until there is something strong enough to reject it – that will be found on the data collected. If no data is able to disprove the hypothesis, some empirical tests and experiments may be necessary to assess whether the hypothesis is confirmed or rejected. Problem reproduction, either at the customer’s environment or in house, may be necessary to see the problem happening and make the necessary tests looking for ways to validate the hypothesis.
The scientific method looks for understanding some phenomenon, and then to find likely explanations that have a degree of certainty based on accuracy of data collected and analyzed. In problem resolution, though, the troubleshooter is looking likely causes of a problem that can be then confirmed and addressed allowing the problem to be fixed.
During this phase, the findings of the research will be reported, indicating which hypothesis are supported by data and therefore confirmed as the most viable scientific explanation. In a troubleshooting framework, in this phase a resolution action plan will be shared with the customer so he or she can finally apply it on the environment and evaluate the results.
The support engineer is instrumental in this phase to help the customer to understand how the action plan should be performed and to answer any questions the customer may have around that; he or she many even need to perform the steps of the action plan him or herself to make sure it is implemented correctly. Again, sound technical knowledge on the matter will be the difference to help the customer with an implementation that will be effective and with less pain possible.