Protecting Intellectual Property with .NET Obfuscation
By Eran Dror
Getting paid for what you deliver is the basis for a long-term profitable business. Yet software vendors continue to lose about 41 percent of their income through software piracy. According to BSA (Business Software Alliance), the worldwide piracy rate went up from 38 percent in 2007 to 41 percent in 2008; the global rate rose for the second year in a row. The monetary value of unlicensed software (losses to software vendors) grew by more than $5.1 billion (11 percent) to $53.0 billion from 2007 to 2008.
Software vendors looking to increase their revenue stream should find ways to stop piracy of their software. Application source code constitutes a company’s software intellectual property (IP) and is vital for maintaining its competitive advantage and revenue stream.
In this article I review obfuscation as a means for providing IP protection. However, first I'd like to make a distinction between two different terms that are often mixed up when used in conjunction with the term "obfuscation."
The first term is "IP protection," which refers to the means used by a company to protect its intangible assets, created by its employees, often embedded in software code. This intellectual property is the company's competitive advantage.
The second term is "copy protection," which is the technology used by a company to prevent the reproduction of its software.
Occasionally, obfuscation is argued for as a means for copy protection, aiming to prevent distribution of unauthorized copies of software. In reality, obfuscation isn't suited for that goal. In fact, copy protection can be easily circumvented even if your code is obfuscated.
The need for obfuscation comes from the fact that .NET applications are compiled into an Intermediate Language (IL), and it is possible for users to decompile deployed applications, which makes your source code easily accessible.
Developers are inspired to write code as clear and easy to understand as possible. The rational is to make it easier for you to maintain your code and let others quickly understand it and refactor when needed.
Obfuscation, however, is quite the opposite. It literally means making something less clear and harder to understand. It takes your code and makes it more difficult to read if you open it using a disassembler, while maintaining the application code flow the same as the original.
Now we’ll review some of the obfuscation techniques used by software companies today to protect their code. Before we cover each individual method we'll set criteria to grade against. This will help us to measure the pros and cons of each different method. Depending on your specific situation you may prefer one method over the other, taking into account the following:
- Readability: ability for humans to read and understand the code.
- Reversibility: ability of a tool to undo transformation.
- Performance Impact: impact on code execution.
- Maintenance/Support: impact on ability to support transformed assemblies.
The first method we're going to discuss is entity renaming. As its name suggests, this method renames metadata entities stored in an assembly.
This includes class names, method names, and parameters, fields, events and properties.
In terms of readability, the code looks harder to understand. The example above shows entity renaming is limited to the code that is under the developer's control and doesn't include any external calls to 3rd party libraries or calls to the standard .NET libraries. This is because the .NET Framework uses methods names in order to resolve them at runtime.
Because of the one-way transformation nature of this method, it is impossible to infer the original names that were used and therefore reversibility scores high.
Performance impact is negligible as essentially the complexity of the code remains exactly the same in terms of the instructions that are executed by the jitter.
The downside of using entity renaming is the burden it adds on maintenance. It makes it difficult to debug your code in aproduction environment. Exceptions generated and reported by a user will typically include the obfuscated method and class names, making it almost impossible to trace back the exact locations in the source code.
Entity renaming often breaks your application code due to the usage of reflection API; some .NET practices such as XML serialization, LINQ, and web services rely on reflection API, and therefore you should be careful using this obfuscation method in these scenarios.
Unfortunately, this option calls for adding additional test cycles into the development life cycle. The goal would be to verify that obfuscated code doesn’t break you application code flow, therefore you will need to exercise all the logic in your application that may break as a result of applying entity renaming on it.
Now let’s look at a second obfuscation method called control flow obfuscation. It hides the control flow information of the program by transforming existing code flow patterns to semantically equivalent constructs, which are different, however, fromthe code originally written. The control flow obfuscation algorithm converts the original implementation into spaghetti code thus making it extremely harder to infer program logic.
There are different levels of control flow obfuscation applied by different obfuscators. A naive example of an attempt to obfuscate control flow is demonstrated below:
Looking on the left side you can see the original method; on the right is the result after applying flow obfuscation. Pay attention to the first few instructions. The obfuscator simply adds an invalid sequence of instructions at the beginning of the method, the rest of the method is identical to the original. The added invalid sequence attempts to pop a value from the stack, however at that point the stack is empty and therefore the code will break. In fact, the code is never executed due to the branch statement placed at the beginning of the method that simply skips the invalid sequence; however a disassembler will often break when trying interpreting that code sequence.
More advanced control flow obfuscation is shown below:
The simple "for loop" on the left completely gets mangled and transformed to a switch statement shown on the right. These are 2 different types of logic, but they behave the same at runtime. Seems like the obfuscator has done a great job destroying the original code pattern, the reflector hasn’t been able to reverse the MSIL code into that for loop, instead it reads the code as a switch statement.
Judging control flow obfuscation by the criteria we set earlier, it obviously makes the code harder to read, not in the sense that the names of the identifiers have changed, but in the sense that it makes the logic harder to follow and comprehend.
Reversibility depends on the quality of the solution. The first example can be reversed easily by omitting the invalid code sequence from each method. The more advanced example can’t be reversed to the original form as a one-way transformation was applied on the original code pattern, there is simply not enough information for reflector to detect that a "for loop" was originally used in that particular code sequence.
Performance may vary, depending on the type of obfuscation algorithm used; you’ll have to test it for yourself to determine how it affects your code.
As opposed to entity renaming, very little impact is expected on maintenance and support. Stack trace information remains intact; the ability to debug the code isn’t hindered. The code isn’t broken by usage of reflection API or other methods mentioned in conjunction with entity renaming; therefore no additional test cycles are needed.
The last obfuscation method presented in this article is string encryption. String encryption transforms the strings located in source code so that in the compiled binaries they will appear as encrypted strings. Using reflector or any other static analysis tool won’t reveal the original strings. Additional code is injected into the assembly to support decryption of the strings during runtime. An example demonstrating this method is shown below:
The trial message shown on top is converted to a long Unicode-encoded string that is a complete gibberish. If you have sensitive information stored as a string, such as connection strings, license codes, or a trial expiration date, you would apply string encryption to hide this information, preventing it from being exposed by static analysis tools.
Let’s look at the characteristics for string encryption: human can’t read the strings in their encrypted form, reversibility is a matter of finding the decryption method provided that it’s written as managed code and writing a decryption utility to break the encryption algorithm used. There is a slight performance impact since the strings have to be decrypted at runtime. String encryption demonstrates very little impact on maintenance and supports as the method effects only the MSIL code rather than the metadata entities themselves.
We’ve just reviewed three common obfuscation methods that are aimed towards protecting one’s IP. It is evident that one should take special consideration when selecting the right methods to protect one's code, as some have greater impact on the development process than others. Entity renaming, for example, should be integrated early in the development cycle, as it may have major impact on the application code flow; integrating it at a later stage may cause unnecessary delays.
Control flow obfuscation and string encryption come in many shape and sizes, therefore it is advised that you test different options and compare, and finally choose what is right for your project.
About the Author
Eran Dror is the founder of SecureTeam, a company that develops .NET code protection and licensing products for the .NET market. Eran has over 15 years of experience and an extensive background in software engineering and product development. Today, Eran is in charge of managing SecureTeam's product strategy and marketing efforts.