HTML Sanitization in Anti-XSS Library

Article
08/31/2009

RV here...

For a while now, I have been talking about various types of encodings and how they protect web applications from cross site scripting attacks. In most cases input is simply passed through AntiXss.HtmlEncode or similar methods to transform it into safely displayable HTML entities. In some cases you as a developer would like to get input in the form of HTML and store or display it back as HTML. In this case AntiXss.HtmlEncode would not work because it will transform the HTML as literal text and possibly break the functionality.

The solution is validate the input to ensure that it does not contain any harmful scripts and is safe to be displayed on the browser. Going forward Anti-XSS library will be able to support sanitization of HTML pages and fragments and these new sanitization methods will be part of upcoming 3.1 version of Anti-XSS Library. These sanitization routines were part of the internal version of Anti-XSS library for some time and now we are clear of all legal issues surrounding distribution of these methods externally. The following screenshot shows the entire list of these methods.

AntiXss.GetSafeHtml and AntiXss.GetSafeHtmlFragments sanitize input by parsing the HTML page or fragment and building the safe HTML using a white list of safe elements and attributes. These methods make use of HtmlToHtml class which does most of the heavy lifting. All malicious scripts are removed by these methods making the input safe to display in the browser. Apart from sanitizing the input, output HTML is normalized by automatic addition of any missing tags etc.

How to use AntiXss.GetSafeHtmlFragment?

Imagine if the following input is received from the user.

    1: <a href="https://www.microsoft.com">Microsoft Corporation</a>
    2: <script language="javascript">   1: var a = document.cookie;  </script>

Call any of the AntiXss.GetSafeHtmlFragment methods with the above input. For example the following code using the String overload.

    1: string input = TextBox1.Text;
    2: string output = AntiXss.GetSafeHtmlFragment(input);

The script in the above input is removed and the following string is returned.

    1: <a href="https://www.microsoft.com">Microsoft Corporation</a>

If you are expecting entire page then use AntiXss.GetSafeHtml method appropriately. Multiple overloads for each method provide options to use either Stream or TextWriter objects. HTML sanitization provides the much needed protection against cross site scripting attacks specially when handling rich content. Thus Anti-XSS Library provides comprehensive (or complete) protection against cross site scripting attacks. Keep checking our blog for release information and download links.

HTML Sanitization in Anti-XSS Library

Additional resources