Visual Studio 2012 New Features: Code Clone Analysis
In my travels across the country, with my fellow Evangelist, Clint Edmonson, talking about Visual Studio we often come across great stories to tell. One of our favorite true stories is of a customer that had a web application running very slow. We ran code metrics against it and, sure enough, the Page_Load event had 9,000 lines of code in it.
Naturally we were curious so we opened it up to see that it was basically the same if statement copied over and over. Apparently they needed to find out who was coming into the website in order to show customized content and the solution they came up with was this massive set of statements.
For better or worse we have all had code that gets copied throughout our solutions. Until now there was no tool to tell us there were copies and, instead, we had to rely on other metrics to hopefully reveal any code smells that lead us to duplicates. Now, however, we have the new Code Clone Detection (aka Code Clone Analysis) feature.
According to the documentation:
“Code clones are separate fragments of code that are very similar. They are a common phenomenon in an application that has been under development for some time. Clones make it hard to change your application because you have to find and update more than one fragment. Visual Studio can help you find code clones so that you can refactor them.”
You can find clones of specific code by selecting the segment you are interested then right click on the selection to choose Find Matching Clones in Solution from the context menu:
Visual Studio will search for code clones and produce the result in the new Code Clone Search Results window:
The original line of code is put in a group on its own and then all the matches are put into a different group. You can expand the groups to see the specific locations of the matches:
Also, you can double click on any entry in the list to go to the selection in your code file:
Besides looking for specific clones you can also look for code clones for the entire solution. To use this feature go to Analyze | Analyze Solution for Code Clones:
This creates a result set for the entire solution:
By default it groups and sorts the results by the strength of the match. Exact matches come first then those matches that may be close but not exact come next and so on. The terms you may see are Exact, Strong, Medium, and Weak.
Once you have the result set, there are a couple of ways you can compare them against each other.
If you have a comparison tool configured you can Right-click on any item and select Compare from the shortcut menu:
You would know if you have this feature available by going to Tools | Options | Source Control | Team Foundation Server and click on Configure User Tools.
If you don’t have a comparison tool you can do manual comparisons between two entries in the list. If the clones are in different files then you can just double-click each entry and it will open the file as well as highlight the entry that is duplicated as mentioned earlier.
What Is Found
You are probably curious as to what is found by this tool. The heuristics for finding clones will find duplicates even if the following changes have happened:
· Renamed identifiers
· Insert and delete statements added
· Rearranged statements
What Is Not Found
There are some rules for what is not found as well. I have taken this list from the documentation pretty much verbatim.
· Type declarations are not compared. For example, if you have two classes with very similar sets of field declarations, they will not be reported as clones. Only statements in methods and property definitions are compared.
· Analyze Solution for Code Clones will not find clones that are less than 10 statements long. However, you can apply Find matching clones in solution to shorter fragments.
· Fragments with more than 40% changed tokens.
· If a project contains a .codeclonesettings file, code elements that are defined in that project will not be searched if they are named in the Exclusions section of the .codeclonesettings file.
· Some kinds of generated code are excluded:
· *.designer.cs, *.designer.vb
· InitializeComponent methods
Code Clone Settings and Exclusions
A settings file is available to configure this feature at the project level. Currently we have only announced the ability to do exclusions in the file but there will most likely be other elements that are added later on. The file is just XML with a .CODECLONESETTINGS extension. The only requirement for use is that the file exists in the top level directory of the project.
The base elements consist of a CodeCloneSettings element with an Exclusions child:
Within the Exclusions element you can have the following children:
This element is used to indicate files that should be excluded from analysis. Path names can be absolute or relative and you can use wildcards as well. So, for example, to ignore all the C# text template files that have been put in their own directory (called MyTextTemplates) you might have the following:
<Namespace>, <Type>, and <FunctionName>
You can also exclude namespaces, types, and functions. Just like files these items can use absolute names or names with wildcards in them. Here is an example of what it might look like:
Exclusion File Example
In the Tailspin Toys sample there is some generated code in the TailSpin.SimpleSqlRepository project that is the bulk of the duplications:
When I run code analysis, this is the result:
Code clone analysis doesn’t automatically know to ignore text templates so I would create an XML file called TailSpinRepository.codeclonesettings and insert an entry like this:
Now if I run clone analysis again here is what I get:
As you can see the results are significantly less than the first time the analysis ran. It’s common to create several exclusions in different projects to weed out noise in the analysis results.