5 - Getting Good Feedback
In the last chapter, the Trey Research team automated their release pipeline. They've come a long way. From a simple, largely manual pipeline that, in many ways, hampered their efforts rather than helped them, they now have a pipeline that ensures the quality of their software and has made releases, which were often chaotic and stressful, a far more predictable and repeatable process. Their pipeline has automated testing and automated deployments. It adheres to best practices for continuous delivery, such as using a single build that's deployed to many environments. The team knows that things are better. Their lives are easier and they're not staying at work until midnight, but they have no way of proving that the improvements they've made, which took time and money, are actually benefiting the business. Here's their situation.
This chapter is about how to really know how well your project is doing.
The Importance of Feedback
Many projects are managed without any concrete data that can help people make good decisions. A team might think that their work yields good results in a reasonable amount of time, but they don't have any actual information to prove it. Other teams spend valuable time tracking and analyzing either the wrong metrics, or metrics that give only a partial view of a situation.
The importance of good information that gives insight into a project can't be overemphasized. In general, we'll call this information feedback. Feedback is the most powerful tool a software development team can use to ensure that a project is progressing as it should.
The faster and more frequently feedback is available, the better a team can adapt to changes and anticipate problems. Feedback that is easy to generate, collect and act on can have a direct impact on a business. It helps you to focus your efforts in the right places if there are problems, and to create the new services and applications that your customers want.
The Importance of Good Communication
There are many ways to gather good feedback, and this chapter discusses some of them. However, the most important is to make sure that the people on the team talk to each other. Agile processes (for example, Scrum) prescribe periodic meetings such as standups where team members gather daily to learn what everyone's doing and where they can discuss issues. Agile processes also advocate retrospectives, where team members evaluate a project they've just completed to discuss what went well and what could be improved.
The Importance of Visibility
Agile processes stress good communication, not just in its verbal form, but also by using visual aids that encapsulate the status of a project. These aids are in a public place, where all team members can see them, and are easily understood. In Chapter 1 we discussed information radiators and used a traffic light as an example that many teams have adopted. The light quickly lets people know the status of the current build.
Another visual aid that is used in this guidance is the Kanban board. The Trey Research team uses these boards to understand the tasks that need to be done and their statuses. Examples of their Kanban boards are in several chapters of this guidance.
Feedback, DevOps, and Continuous Delivery
As you can imagine, both continuous delivery and DevOps rely on fast feedback to succeed. For DevOps, with its emphasis on collaboration, feedback is critical. DevOps stresses that everyone involved in a project must constantly communicate with each other.
However, because the focus of this guidance is the continuous delivery release pipeline, we're going to concentrate on how to get feedback from the pipeline itself. The three activities we'll discuss are:
- Generating feedback in the pipeline.
- Gathering feedback from the pipeline.
- Using metrics to evaluate the release process.
A fourth activity, acting on the feedback, depends very much on the situation you're in, and the type of feedback you're getting. This guidance shows how the Trey Research team reacts to various forms of feedback, much of it negative. Examples of acting on feedback include adding new features to the backlog, changing the way the team does code reviews, or even canceling a project. Although it's not possible to tell you how to act on the feedback you get, it is important to understand that acting on feedback as promptly and effectively as possible is the whole reason for generating and gathering it at all.
You can generate feedback from each stage of the pipeline. Generating feedback automatically is far better than trying to do it manually. After you've configured your pipeline, you collect data every time an instance of the pipeline runs, without having to do anything.
At first glance, gathering feedback might seem trivial. It turns out, however, that it can often be difficult to retrieve, filter, manage ,and even uncover all the information that the pipeline generates. Two ways to gather information are to:
- Monitor the pipeline itself for information about running instances and the results from each stage.
- Monitor the application as it runs in the different environments.
Another way to gather feedback is to use metrics. Metrics are so useful that they deserve a section of their own.
Continuous delivery pipelines use fast feedback loops to ensure that the code that is built works properly and is delivered promptly. It also uses feedback to validate that the code that you actually build is, among all the possibilities for what could be built, the best choice.
To validate such a choice, the most useful metrics for continuous delivery are those that assess the economic impact your choices have on the business. Many teams focus on metrics that actually aren't meaningful or that give only a partial view of a situation. For example, some teams measure the number of bugs they find in a project as a way to measure quality. However, if it takes three months to release a fix, simply knowing that the bug exists isn't that useful. The four metrics discussed in this chapter help organizations understand how effective their process is in its entirety, how often defects are discovered in the software, and how long it takes to remove those defects. These are the metrics.
- Cycle time, which is how long it takes between when you decide to make a change and when you deliver it. Cycle time is the most important metric for continuous delivery. This metric gives you a global view of your release process. It measures how the pipeline functions as a whole, and doesn't focus on the efforts of a particular discipline or organizational silo.
- Mean Time to Recovery (MTTR), which is the average time between when a problem is found in the production environment and when it is fixed.
- Mean Time Between Failures (MTBF), which is the average time between one failure in the production environment and the next failure.
- The defect rate, which is closely related to the MTBF and is the number of defects that are found per unit of time.
While these metrics aren't directly available in TFS, the information you need to calculate them is and is easy to retrieve. There are also other useful metrics that are directly available in TFS. Later in the chapter you'll find a brief discussion of these as well.
Patterns and Practices for Getting Good Feedback
This section discusses some patterns and practices that you can use to inform your approach to generating and gathering feedback from the pipeline, and for using metrics.
Automate the Generation and Gathering of Feedback
Just as with deployments and testing, automation makes it easier and more efficient to gather feedback from the pipeline. The pipeline can be configured to:
- Automatically generate information about running stages and the steps within them.
- Automatically gather that information and prepare it so that it is easily comprehensible.
Other than certain steps in the commit stage, such as the ones that build the binaries, and certain steps in the release stage, such as the ones that deploy to the production environment, everything else in the pipeline is there to provide some type of feedback. For example, you can get feedback about how a change to the code or to the configuration affects the application. The pipeline also tells you if the new code is ready to be delivered to your users.
If there is any information that you think will be useful, think about how you can configure the pipeline to generate that data. You may add a specific step to a stage, such as the code analysis step in the Trey Research pipeline. You can also add a new stage that generates a particular type of feedback. The acceptance test stage in the Trey Research pipeline is an example. It assesses whether the code behaves as expected after any change.
Of course, as always, if you can't automate at the moment, then run a manual step inside the pipeline. However, always make sure that you generate data that you can gather and present in an easily comprehensible way.
Design Software with the Operations Group in Mind
The way you architect, design, and code your software affects the quantity and quality of the feedback. Operations people need information about the applications they're responsible for maintaining and running. Software should be designed from the outset to provide information about an application's health, its status, and potential and immediate problems. From a DevOps perspective, involving operations people in the development process in order to include the correct instrumentation encourages collaboration between groups that frequently never communicate with each other.
The Design for Operations website on CodePlex provides both a tool and guidance for creating highly manageable applications. One practice it advocates is to make business-related metrics available as well as information about an application's health and status. Business-related metrics might include the volume of data submitted to the application by web services at any moment, or the amount of money transferred as a result of financial transactions that occur within the application. Examples of information about the application itself are performance metrics, and the use of computing and network resources.
Another best practice is to use standard, well known instrumentation mechanisms that are generally familiar to operations people. They should be able to manage these mechanisms with standard monitoring tools such as Systems Center Operations Manager (SCOM). Examples of these instrumentation mechanisms include:
- Windows performance counters
- Windows event logs
- Windows management instrumentation
- Trace and log files
For more information about SCOM, see System Center Operations – 2012 Operations Manager.
Monitor the Pipeline
All the feedback that is generated and gathered is useless if you can't access it. Monitoring each pipeline instance allows you to can track the status of each change that is made and, in turn, the status of the application and the project as a whole.
Monitoring involves using a tool such as Build Explorer, which is available in Team Explorer. In terms of pipeline instances, the tool should provide information about:
- The instances of the pipeline that are running at any moment.
- The instances of the pipeline that have been completed.
For each of these instances, the tool should provide information about:
- The status of each stage, such as is it running, has it failed, has it partially succeeded, or has it succeeded entirely.
- The status of each step.
- Which manual stages and steps are ready to be run by a team member.
Monitor the Application
Monitoring the running application as changes are introduced across different environments is another way to obtain useful information. The feedback you get will be better if you have prepared your application according to the practices outlined in Design for Operations, however, even if you haven't, monitoring the application will alert you to potential and immediate issues. You can use this information to help you decide what needs to be improved or where to focus future development efforts.
Again, the amount and quality of the feedback is improved if you can automate how you monitor an application. Tools such as SCOM, once configured, can not only do the monitoring for you, but also generate alerts as a result of specific conditions. You can even provide this information to the team if you synchronize the alerts with Team Foundation Server. For more information, see How to Synchronize Alerts with TFS in System Center 2012 SP1.
Monitor the Work Queues
Long work queues, where tasks are inactive and wait for long periods of time before someone can address them, can cause many problems in a project. For example, cycle times can grow because tasks that must be performed before a feature is released to customers aren't being completed in a timely manner. A side effect is that it's also possible for people to begin to sacrifice quality in order to keep cycle times low.
Another issue is that long work queues can decrease the value of the feedback you're gathering because it isn't current. Timely feedback is only useful if tasks are closed quickly. Finally, tasks that sit in a queue for long periods of time can become outdated. The work they describe may no longer be applicable, or it may even have been invalidated.
Long queues can have negative effects on the team. For example, one commonly used metric is capacity utilization, which measures if people are working at full capacity. However, over emphasizing this metric can make queues longer. If people have no free time, then new tasks, which are probably at the end of the queue, don't merit immediate attention. Also, there's no incentive for people to complete tasks quickly if they're only judged by how busy they are. Measuring capacity utilization can discourage people from leaving some of their time unscheduled in order to react quickly to changes in the project. In addition, having to show that they're working all the time can put people under pressure, which is when they are most likely to make mistakes. A healthy project needs a balance between queue length and capacity utilization.
The problem is that it's difficult to make sensible tradeoffs if there's no information about the length of the work queues. Just like the pipeline, they need to be monitored. The first step is to decide what to track. Here are some important pieces of data.
Work in Progress
Work in progress is the first priority. If your team is working on an item, it belongs in a work queue.
Blocked work also belongs in a queue. These are tasks that are waiting to be addressed by the team, have not yet been started, or are items for which the work has already begun but has been halted because there is a blocking issue. Blocked tasks can have long term harmful effects. They need to be explicitly included in the queue.
Hidden work is very important. Hidden queues of work form when a team accepts tasks that are not explicitly tracked inside the queues used for regular project management. One very big problem with hidden work is that even though it can consume a great deal of a team's time, there's no way to make these efforts visible to management. A hidden task might be a special, urgent order from the CEO. It might be something that doesn’t, initially, look like work but that keeps the team busy later. As soon as you detect hidden work, you should add it to the queue so that it's now visible and monitored.
How To Monitor Queues
Once you know what to track and your queues contain those tasks, you can track queue length and status. With TFS, you can use work items to create the queues and use work item queries to check the status and length of the queues. You can also create queues dedicated to a particular purpose, such as new features. If you use the MSF for Agile process template, then new features are User Story work items. Another example is a queue for defects. These are Bug work items in the MSF for Agile template. General work is described with Task work items.
In TFS, the Product Backlog work item query in the MSF for Agile process template is an example of a tool that helps you monitor and manage queues. The following screenshot shows an example of the results of such a query.
A convenient way to manage and monitor queues is the Cumulative Flow Diagram (CFD). The CFD provides a great deal of information in a single graph. It shows the number of items in the queue at each point in time, and differentiates them by their statuses. A CFD not only illustrates the status of the queues, but also gives insight into the cause of some problems, and can even help you to discover potential issues. Another good point about a CFD is that the information is up to date because the queues are shown in their current state.
The following screenshot shows the CFD for the Trey Research project.
Here are some of the things you can learn by analyzing a CFD. Numbers in the following text correspond to numbers in the graph.
- The rate at which items enter and leave queues, as well as variations to that rate. The slope of the curves provides this information. For example, if you compare the slope of lines (1) and (2), you can conclude that during the first week of the project, as many as 20 new items were added to the backlog. Later, around the fifth week of the project, no new items were being added. Again, by examining the slopes, you can see that, by the last week, the team was delivering items at a higher pace (4) than four weeks earlier (3).
- The time items spend in each queue (or sets of queues). The width of the stripes provides this information. One of most important times to track is the lead time (5), which represents the total time spent on all activities until the item is done. You can see that, at the beginning of the project, the lead time was about 5 weeks. This means that it would take a new item approximately five weeks to be released. The cycle time (6), which tracks the time spent by the development team on new items, was around one week at the point marked in the graph.
- The number of items in each queue (or in sets of queues). The height of the stripes provides this information. For example, you can see the backlog size at the beginning of the third week (7), and how it decreases later. You can also see that the amount of unfinished work (8) has been steady for the duration of the project. This is also true of the amount of work in progress (9). These are probably consequences of there being work-in-progress limits that are enforced by the Kanban board so that the team doesn't accept more tasks than it can handle. You can also track specific queues. You can see that the number of items waiting in the Ready for coding queue (10) began to decrease around the ninth week, after being unusually high during the preceding weeks. It's possible that the team, by constantly analyzing the CFD and the Kanban board took some action that addressed the problem.
All this information is easily generated simply by keeping the work items in TFS up to date.
William Thomson, the 1st Baron Kelvin, is best known for his work on the laws of thermodynamics, and for determining the value of absolute zero. His statement on the importance of measurement is also applicable to software development.
I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.
This section discusses four important metrics for continuous delivery: cycle time, MTTR, MTBF and the defect rate.
What Is Cycle Time
Cycle time is the development (or implementation time). This is in contrast to lead time, which includes all the activities that occur until an item is completed. For example, lead time includes the time spent by stakeholders deciding if a feature should be implemented.
Why Track and Analyze Cycle Time
There are many reasons why tracking and analyzing cycle time is a valuable activity. Here are some of them.
- Cycle time measures the effectiveness of the entire pipeline.
- Cycle time helps you to identify wasteful activities and eliminate them, or to improve the way activities are currently performed.
- Cycle time allows you to uncover bottlenecks in the release process.
- Cycle time gives insight into how changes to the release process and even specific decisions affect the time it takes to deliver the software to users.
- Cycle time helps you to measure the predictability of the release process, and, after the process stabilizes, can help you to make better forecasts about when you can deliver your software to users.
How Do You Measure Cycle Time
In order to measure cycle time, you need to record when the implementation starts, and when the change is available to your users. The amount of time between these two moments is the cycle time for that change.
Again, cycle time is a way to measure and improve the implementation process. What implementation means can differ from one organization to another but, in simple terms, it would be fair to say that it is the set of activities performed by the development team.
The units to use for measuring cycle time depend on how long it takes, on average, for a team to deliver its new software. Many teams find that the day is a good unit to use.
You can plot the cycle time of each change on a graph so that you can see the trend and also identify and investigate edge cases. When a project has been active for a long time, or if there's a large number of changes that are released, it can be useful to group these changes. For example, you might want to obtain the average cycle time over some range of weeks, in order to see how the metric changes.
How Can You Use Cycle Time to Improve the Release Process
As a rule of thumb, the shorter the cycle times the better, so the trend line should be descending. This means that you are eliminating wasted efforts and bottlenecks and your release times are therefore growing shorter. Make sure that minimizing the cycle time doesn’t become a goal in and of itself. The goal should always be to deliver valuable software that is good for your customers. If reducing cycle time becomes the objective, it can come at the cost of sacrificing the quality of the software or by taking shortcuts. The way to lower cycle times is to optimize the release process by automating repetitive tasks and by getting rid of unnecessary ones.
After spending some time improving the release process, cycle times become more predictable and the values will probably fluctuate around a fixed range. This might mean that you should look for improvements in new areas. Of course, you should try to maintain this stable pattern and not have the values increase again.
There are situations where increasing cycle times are acceptable. For example, if you're focusing on improving the release process itself, perhaps by implementing new stages for the pipeline or by adding automation, your cycle times will probably increase.
A disadvantage of cycle time is that it's a lagging metric, in contrast, for example, to the CFD, which shows data in its current state. You can’t measure the cycle time for an item until the work is complete. It's a good idea to track cycle time in conjunction with the CFD.
What is MTTR
MTTR is the average time between the moment a problem is found in the production environment and the moment that the problem is fixed. Production bugs are the focus because bugs found and fixed during the development process don't have a direct impact on the business.
MTTR is also known as Mean Time to Resolve and Mean Time To Repair. Within the Information Technology Infrastructure Library (ITIL) it's named Mean Time to Restore Service (MTRS). (ITIL is a set of practices widely used for IT service management.)
Why Track and Analyze MTTR
MTTR is similar to cycle time. Tracking this metric yields the same types of benefits, such as the ability to identify and remove bottlenecks. The difference is that MTTR is related to the ability to resolve defects and deliver fixes rather than implement and deliver new features. You can think of MTTR as a special case of cycle time.
Measuring MTTR independently of cycle time is a good idea because most teams are particularly interested in bugs. For example, some teams have a zero defect policy, where any bug is either resolved immediately or discarded. In this situation, it's very useful to know the average time needed to fix a problem. A low MTTR also points to a better experience for end users and stakeholders. Generally, a low MTTR means that customers encounter quick resolutions to problems. For stakeholders, the sooner a bug is fixed, the less impact it has on the business.
How Do You Measure MTTR
To measure MTTR, you do the same as you would for cycle time, but use defects as the basis of the measurement instead of new features. For each defect, you record when the problem was found and when it was fixed, in the production environment. The amount of time between these two moments is the time to recover for that defect. The MTTR represents the average time that it takes to recover from a defect.
The units for measuring MTTR depend on how long it takes, on average, for a team to fix production bugs. This can depend on the policy of the organization about defects, and the number of critical bugs as opposed to those of low importance. For some teams, it might take days, but for other teams a better unit is the hour.
As with cycle time, you can plot MTTR on a graph, so that you can see the trend and also identify and investigate edge cases. When a project has been active for a long time, or if there is a large number of fixed bugs, it can be useful to group them. For example, you might want to obtain the average MTTR over some range of weeks, in order to see how the metric changes.
To improve the development process, calculate the MTTR using only the time actually spent fixing bugs. Don't include other activities such as triaging bugs. To improve the overall process of fixing bugs, calculate the MTTR and do include other, related activities such as triaging bugs.
How Can You Use MTTR to Improve the Release Process
Once again, the answer is similar to cycle time. A low MTTR is better, so the trend should be descending. A larger MTTR is acceptable in some situations, such as when you're improving the release process but (hopefully) not negatively affecting the business.
What Is MTBF and the Defect Rate
MTBF is the average time between the moment one problem is found in the production environment and the moment that the next problem is found in the production environment. As with MTTR, MTBF only includes production bugs because they are the bugs that can have a direct impact on the business. MTBF is closely related to the defect rate. They are inverses of each other, so MTBF = 1 / Defect rate. In other words, the defect rate is the number of defects found for each unit of time.
Why Track and Analyze MTBF and the Defect Rate
These metrics help you to keep track of the quality of your software. If MTBF decreases, (or the defect rate increases), it can signal a quality control policy that is too lax or is being ignored. Poor quality control can have a direct impact on the business. When customers keep finding bugs they become frustrated and lose confidence. It also means that the application isn't functioning properly, which can have a negative economic effect on the business.
The MTBF, the defect rate, and the cycle time are closely related. Typically, more defects means there is less time to spend on new features, so you may see an increase in cycle times if the MTBF decreases.
There is a close relationship between MTBF and MTTR as well. Together, these two metrics indicate the overall availability of the application.
How Do You Measure MTBF and the Defect Rate
To measure MTBF, you record when each defect is found and calculate the average time between defects. The units of measurement depend on how often bugs are found, but for most teams, either the day or the hour should be suitable.
For the defect rate, you count the number of defects in the system. The unit of measurement is the number of defects per unit of time (for example, the number of defects per day or per week).
Plotting the MTBF and the defect rate lends a better understanding of these metrics. You can see the trends and examine edge cases. Even though one metric can be derived from the other, it's still valuable to track them independently. MTBF and the defect rate provide different views of the same information, so they complement each other.
When a project has been active for a long time, or if there is a large number of bugs, it can be useful to group the bugs. For example, you might want to obtain the average MTBF over some range of weeks, in order to see how the metric changes.
How Can You Use MTBF and the Defect Rate to Improve the Release Process
MTBF should be as large as possible. The more time that passes between one defect and the next one, the better. The defect rate should be as small as possible. A smaller MTBF (or a larger defect rate) indicates that something is wrong with how quality is assessed in the development process.
Other Useful Metrics
There are many other metrics that are useful for providing feedback about a project. Every time you create a new team project, the New Team Project wizard generates a set of standard reports, such as velocity and burndown rates, depending on the process template that you select. These reports are available from Team Explorer. Every team project has a Reports node, where you will find the reports that have been generated. For more information about standard TFS reports, see Create, Customize, and Manage Reports for Visual Studio ALM. You can use tools such as Build Explorer to get information about automated builds. If you are interested in data about your code, you can use the Analyze menu that is included with Visual Studio Premium or Visual Studio Ultimate. In terms of continuous delivery, however, while these are certainly useful metrics, they aren't mandatory.
The Trey Research Implementation
Now let's take a look at how Trey Research is implementing these patterns and practices. When we left the team, they'd finished automating their pipeline, and now had a fully functional continuous delivery release pipeline. They still have some unsolved problems, though. Here are the ones they're going to address in this iteration.
For each change to the code, they don't have an easy way to know if the change meets all the conditions that make it eligible for release to production.
They don't monitor the pipeline in a way that makes it easy to know what happens to a change in each pipeline stage.
Use Build Explorer and the corresponding Build section inside the TFS team project Web Access site.
They don't have enough information to help them make good decisions during the development process.
They are missing some key metrics.
Start to track cycle time, MTBF, and MTTR.
The first solution involves using tools that they already have as part of TFS. The second solution involves learning how to create custom reports. The following pipeline diagram includes these activities.
The rest of this chapter explains what the Trey Research team did. As a general description, we can say that they focused on improving transparency and visibility. They've learned how to monitor the pipeline and to use automation in order to gather feedback. They've learned how to use metrics, and how to present the feedback they've gathered. In particular, they've begun to track cycle time, MTTR, and MTBF so that they can evaluate how the changes they're making impact the quality of their software and the business. First, let's get back to Jin's story.
Here are Jin's feelings at the outset of the iteration.
Monday, September 2, 2013
Everyone on the team is euphoric after we got our pipeline working. Sure, we deserve a party, but we're not close to being finished. My problem is that we can't measure what we've done, and if we can't do that how do we prove to management it was all worth it? So, we're going to make sure we get feedback from the pipeline and we're going to start using metrics.
Another good thing is that the team is really feeling involved. Iselda's happy so many of the tests are automated, and she's even suggesting we add an exploratory testing stage. Raymond is happy because releases aren't keeping him up until 03:00. Paulus is happy because he's getting quick feedback on his new features, and isn't chasing bugs that don't exist. Even Zachary is happy. The app works, people like it, and we're making money. Hey, we're all happy.
Here's the Trey Research backlog for iteration 4.
Jin's next entry tells us how things turned out at the end.
Friday, September 13, 2013
We implemented the new metrics. We're tracking cycle time, MTBF, and MTTR. Cycle time's still unpredictable because of all the time we spent on the pipeline but the good news is that all the trend lines are heading in the right direction and that's something we can show to management.
Here's the Trey Research cycle time, over a series of weeks.
Here's their CFD.
For more information about how to generate and interpret the Trey Research reports, see Lab 4.2: Metrics for Continuous Delivery in TFS.
How Did Trey Research Add Monitoring and Metrics
This section discusses how Trey Research implemented monitoring and metrics, and the best patterns and practices that they followed. For a step-by-step description, see the group of labs included in Lab04-Monitoring.
How Is Trey Research Automating the Generation and Gathering of Feedback
When they designed the pipeline, the Trey Research team made sure that stages and/or steps were in place for the types of feedback they needed to ensure the quality of their software. They use different mechanisms to generate and gather feedback. Some of it is done by using the logging and tracing mechanisms that the pipeline provides. The following screenshot shows the WriteBuildMessage workflow activity the team uses to generate some feedback about the stages that the pipeline is about to trigger.
The automated tests that the team uses also provide feedback in the form of test results that can be read and analyzed after each testing stage has finished. There are also some steps in some stages that generate specific types of feedback. An example is the code analysis that is always performed in the commit stage.
How Is Trey Research Designing for Operations
The team has been so busy implementing the orchestration, automation, and monitoring of their release pipeline that they haven't done this. It's in their backlog and when they have time, they'll address this issue.
How is Trey Research Monitoring the Pipeline
The Trey Research team uses Build Explorer and the corresponding area inside the TFS team project Web Access site .
As well as using Build Explorer, the Trey Research team use the alerts system in TFS to receive prompt notification about important events that occur within the pipeline. The following screenshot shows an example of how they've set up an alert in TFS that uses email to warn them if any of the stages in a pipeline instance fail.
How Is Trey Research Monitoring the Application
Just as with the designing for operations, the team hasn't had the chance to set up application monitoring. Again, this is on their backlog.
How Is Trey Research Monitoring the Work Queues
The team uses TFS to enter all their tasks as work items, to track the work items, and to get information about them by using queries. For more information, see Process Guides and Process Templates for Team Foundation Server.
They also use the CFD that is available through the TFS team project Web Access site, as well as the TFS Kanban board. For more information about Kanban boards, see Manage your backlog with the Kanban board.
How Is Trey Research Tracking and Analyzing Cycle Time
The team uses TFS work items to list and manage features to be implemented. They use the activated and closed dates of the User Story work items to calculate the cycle time for each work item. They prepare a custom TFS report that shows the trend over time. (Although cycle time is shown in the CFD, the custom report is easy to prepare and it provides a more detailed view.)
How Is Trey Research Tracking and Analyzing MTTR
The team uses TFS work items to list and manage production bugs. They use the activated and closed dates of the Bug work items to calculate the MTTR. They prepare a custom TFS report that shows the trend over time. For complete details on how to track the MTTR, generate a custom report, and interpret the results, see Lab 4.2: Metrics for Continuous Delivery in TFS. In the future, the Trey Research team plans to improve how they track this metric by distinguishing between production bugs and other types of bugs and to filter using the appropriate classification.
How Is Trey Research Tracking and Analyzing MTBF and the Defect Rate
The team uses TFS work items to list and manage production bugs. They use the activated and closed dates of the Bug work items to calculate the MTBF, and they only use bugs that are already closed to make sure they don't include bugs that are invalid. They prepare a custom TFS report that shows the trend over time. They haven't starting explicitly tracking the defect rate, but because this is a standard TFS report, they plan on using it soon.
For complete details on how to track the MTBF, generate a custom report, and interpret the results, see Lab 4.2: Metrics for Continuous Delivery in TFS. In the future, the Trey Research team plans to improve how they track this metric by distinguishing between production bugs and other types of bugs and to filter using the appropriate classification.
The team also uses the standard Bug Trends Report to help track the rate at which they discover and resolve bugs. For more information, see Bug Trends Report.
The Revised Value Stream Map
After the team spent some time gathering data, they completed the value stream map they created in Chapter 3 by replacing the placeholders with times. They used the CFD and cycle time reports to get the approximate values. Here is what the new value stream map looks like.
The team didn't include the UAT time in their calculations because it depends on when end users can dedicate some time to testing the application. Also, they now have high confidence in their acceptance tests, so they release the application in parallel to when it enters the UAT stage.
After all their efforts, they found that the cycle time is almost half what it was when they began. Their lead time has decreased by 1.5 weeks. Orchestrating and automating the pipeline, as well as following the best practices for continuous delivery, has helped them to dramatically reduce both their value-added time and their wait time. The most extreme example is the wait time between the code and acceptance test activities. In fact, there is no longer a wait time because the transition between the two activities is done automatically, inside the pipeline, when the commit stage automatically triggers the acceptance test stage.
This chapter discussed ways to get good feedback about your project. There are some patterns and practices you can follow, no matter what technology you use. Some of them include using automation, monitoring the pipeline, the application, and work queues. You also learned to interpret a CFD, which encapsulates a great deal of information about a project. The chapter also stresses the importance of metrics. In particular, the success of a continuous delivery pipeline is best shown by tracking cycle time, MTTR and MTBF.
There are a number of resources listed in text throughout the book. These resources will provide additional background, bring you up to speed on various technologies, and so forth. For your convenience, there is a bibliography online that contains all the links so that these resources are just a click away. You can find the bibliography at: http://msdn.microsoft.com/library/dn449954.aspx.
The book Principles of Product Development Flow by Donald G. Reinertsen has*a great deal of information about how to monitor and manage queues. Although it covers all types of product development, the principles it discusses also apply to software development. For more information, see the Reinertsen & Associates website at http://www.reinertsenassociates.com/*.
There is another approach to calculating MTBF that uses a slightly different method. It measures the time between the moment when a defect is fixed to the moment when a new defect appears. By defining the metric this way, you learn the average time the application is available. For more information, see the Wikipedia article about mean time between failures at http://en.wikipedia.org/wiki/Mean_time_between_failures.
The Design for Operations website at http://dfo.codeplex.com/ provides both a tool and guidance for creating highly manageable applications.
For information about standard TFS reports, see Create, Customize, and Manage Reports for Visual Studio ALM at http://msdn.microsoft.com/library/bb649552.aspx.
For information about SCOM, see System Center Operations – 2012 Operations Manager at http://technet.microsoft.com/systemcenter/hh285243.
For information about how to use alerts with TFS, see How to Synchronize Alerts with TFS in System Center 2012 SP1 at http://technet.microsoft.com/library/jj614615.aspx.
For information about managing work items, see Process Guides and Process Templates for Team Foundation Server at http://msdn.microsoft.com/library/hh533801.aspx.
For information about using TFS Kanban boards to manage your backlog, see Manage your backlog with the Kanban board at http://msdn.microsoft.com/library/vstudio/jj838789.aspx.
For information, about using standard bug trend reports, see Bug Trends Report at http://msdn.microsoft.com/library/dd380674.aspx.
The hands-on labs that accompany this guidance are available on the Microsoft Download Center at http://go.microsoft.com/fwlink/p/?LinkID=317536.