Visual Studio 11 Developer Preview: Code Clone Detection (aka Code Clone Analysis)

[NOTE: This post has been depricated, you can find the updated post here:]



Versions: Visual Studio 11 Developer Preview


Make sure to get your copy of the new book from Sara and me:

Coding Faster: Getting More Productive with Microsoft Visual Studio




Note: As always with pre-release software, some of the features may not make it into the final version or may change significantly before RTM. Also, although I will only show features that are publicly available, I may be using a slightly older or newer version of the build than you are so there may be slight differences in the feature set I show and the feature set you currently have.




In my travels across the country, with my fellow Evangelist, Clint Edmonson, talking about Visual Studio we often come across great stories to tell. One of our favorite true stories is of a customer that had a web application running very slow. We ran code metrics against it and, sure enough, the Page_Load event had 9,000 lines of code in it. Naturally we were curious so we opened it up to see that it was basically the same if statement copied over and over. Apparently they needed to find out who was coming into the website in order to show customized content and the solution they came up with was this massive set of statements.


For better or worse we have all had code that gets copied throughout our solutions. Until now there was no tool to tell us there were copies and, instead, we had to rely on other metrics such as lines of code to hopefully reveal any code smells. Now, however, we have the new Code Clone Detection (aka Code Clone Analysis) feature.


According to the documentation:

Code clones are separate fragments of code that are very similar. They are a common phenomenon in an application that has been under development for some time. Clones make it hard to change your application because you have to find and update more than one fragment. Visual Studio can help you find code clones so that you can refactor them.”



Specific Clones

You can find clones of specific code by selecting the segment you are interested in:



Then Right-click on the selection and choose Find Matching Clones in Solution from the context menu:



Visual Studio will search for code clones and produce the result in the new Code Clone Search Results window:



The original line of code is put in a group on its own and then all the matches are put into a different group. You can expand the groups to see the specific locations of the matches:





Solution Clones

Besides looking for specific clones you can also just look for code clones for the entire solution. This will search the entire solution for duplicate code and display the results. To use this feature go to Analyze | Analyze Solution for Code Clones:



This creates a result set for the entire solution:



By default it groups and sorts the results by the strength of the match. Exact matches come first then those matches that may be close but not exact come next. As you can see the other terms used are Strong, Medium, and Weak in this example.




Reviewing Matches

Once you have the result set, there are a couple of ways you can compare them against each other.


Comparison Tools

Although I don’t show it here, if you have a comparison tool configured you can Right-click on any two items and select Compare from the shortcut menu. You would know if you have this feature available by going to Tools | Options | Source Control | Team Foundation Server and click on Configure User Tools.



Manual Comparison

If you don’t have a comparison tool you can do manual comparisons between two entries in the list. If the clones are in different files then you can just double-click each one and file tabs will be available for you to look at for comparison:




When it comes to comparisons in the same file I’ve only found one good way to accomplish this so far. Granted, I have just started playing with this feature so there might be something coming or something I missed that makes this easier. Here is a series of steps to compare two items in the same file.


First, find the first entry you want to look at and double-click on it to open a file tab and highlight the code segment:



Now make a copy of the current code window by going to Window | New Window:



Next, go to the second entry you are interested in and double-click it. The result should be one code segment on each tab so you can compare the two:



You can do this for as many entries in the list as you like. Just repeat these steps for each entry you want to compare.




What Is Found

You are probably curious as to what is found by this tool. The heuristics for finding clones will find duplicates even if the following changes have happened:

· Renamed identifiers.

· Insert and delete statements added.

· Rearranged statements.




What Is Not Found

There are some rules for what is not found as well. I have taken this list from the documentation pretty much verbatim.

· Type declarations are not compared. For example, if you have two classes with very similar sets of field declarations, they will not be reported as clones. Only statements in methods and property definitions are compared.

· Analyze Solution for Code Clones will not find clones that are less than 10 statements long. However, you can apply Find matching clones in solution to shorter fragments.

· Fragments with more than 40% changed tokens.

· If a project contains a .codeclonesettings file, code elements that are defined in that project will not be searched if they are named in the Exclusions section of the .codeclonesettings file.

· Some kinds of generated code are excluded:

· *.designer.cs, *.designer.vb

· InitializeComponent methods

· However, this does not automatically apply to all generated code. For example, if you use text templates, you might want to exclude the generated files by naming them in a .codeclonesettings file.




Code Clone Settings and Exclusions

A settings file is available to configure this feature at the project level. I tried to use it at the solution level but it didn’t work so this is definitely a per-project activity. Currently we have only announced the ability to do exclusions in the file but there will most likely be other elements that are added later on. The file is just XML with a .CODECLONESETTINGS extension. The only requirement for use is that the file exists in the top level directory of the project.


The base elements consist of a CodeCloneSettings element with an Exclusions child:



Within the Exclusions element you can have the following children:


This element is used to indicate files that should be excluded from analysis. Path names can be absolute or relative and you can use wildcards as well. So, for example, to ignore all the C# text template files that have been put in their own directory (called MyTextTemplates) you might have the following:




<Namespace>, <Type>, and <FunctionName>

You can also exclude namespaces, types, and functions. Just like files these items can use absolute names or names with wildcards in them. Here is an example of what it might look like:



Example Scenario

In the Tailspin Toys sample that I have there is some generated code in the TailSpin.SimpleSqlRepository project that is the bulk of the duplications:



Code clone analysis doesn’t automatically know to ignore text templates so I have created an XML file called TailSpinRepository.codeclonesettings and inserted an entry like this:



Now if I run clone analysis here is the result:



As you can see the results are significantly less than the first time the analysis ran. It’s common to create several exclusions in different projects to weed out noise in the analysis results.





Code Clone Detection is a great new tool to add to your arsenal for improving code quality. Combined with Code Analysis and Code Metrics, this will help quickly find potential issues.