What Does ANTI-XSS Offer for HTML Sanitization?

Hi Vineet here.....

My name is Vineet Batta and in keeping with the other introductions here are a few words about myself. I have an engineering degree in Electronics & Communication and have spent quite a lot of time doing security reviews in the application space. Before joining Microsoft as an FTE I worked as a consultant to different teams including TWC and MSN operations. As an FTE I have worked extensively on the Threat Modeling and Analysis Enterprise tool since 2007. I have always enjoyed breaking applications to expose security vulnerabilities and then designing creative solutions to fix them.

My favourite phrase of the moment is;

"Social engineering bypasses all technologies, including firewalls"

To support rich user experiences, increasingly web applications are required to input data in a rich text format. That means the ability to apply formatting basics  like bold, color, embedding hyperlinked URL's etc. This can however lead to potential XSS exploits from vulnerabilities, if a malicious payload is embedded in this rich text.

Content filtering is one of the most important steps we can take to protect our customers and this filtering must apply to all user content which will be displayed in the software client. Items stored in a user’s data store can sometimes inadvertently contain nasty attack vectors, referred to as Persistent Cross Site Scripting. It is the client’s responsibility to protect the user and the users system from these attacks.

The Anti-XSS library also sanitizes tainted/unsafe HTML and emits "safe HTML". In its processing it makes formatting changes that means if the HTML document is not well formed (unbalanced tags or missing tags), it will correct it. To output safe HTM a  white list based approach is used. The Anti-XSS library addresses these issues by exposing SafeHtml and SafeHtmlFragment methods.

Example 1: Usage of SafeHtml method.

If the input HTML stream is
    1: <html>
    2: <head>
    3:     <title>CISG test page</title>
    4: </head>
    5: <body>
    6:     <table>
    7:         <tr>
    8:             <td>
    9:                 XSS TEST <a> My mail box <script>   1:  type='text/javascript' > alert("BAD CODE");" 
</script> </a>
   10:                 <!-- There is a script injection as above. -->
   11:                 <!-- The closing <td> element is missing. -->
   12:         </tr>
   13:     </table>
   14: </body>
   15: </html>
Note the following: 1. The html has closing<td> element missing. 2. The already injected script is part of the input stream <script type="text/javascript" > alert("BAD CODE"); </script> Call one of the GetSafeHtml() methods from overloaded list as below: AntiXss.GetSafeHtml(stringReader, stringWriter); //stringWriter will hold the output.

The output will be well formed HTML and that is (X)HTML compliant

    1: <html>
    2: <head>
    3:     <title>CISG test page</title>
    4: </head>
    5: <body>
    6:     <table>
    7:         <tr>
    8:             <td>
    9:                 XSS TEST <a> My mail box </a>
   10: <!-- There was script that is purged from the output. -->
   11: <!-- The closing <td> element is NOT missing. -->
   12:             </td>
   13:         </tr>
   14:     </table>
   15: </body>
   16: </html>

That's it , so easy to use and useful . :-). Note that even if <html><body> were                                                                                                                                                                  to be missing from input stream, call to this method would have added these to form                                                                                                                                                         a well form HTML document.

Example 2: Usage of SafeHtmlFragment method.

As the name suggest SafeHtmlFragment is used if you must output only a fragment of the HTML body content and not the entire HTML document.The method will not output any <html><body> elements if they are missing.

Consider the unsafe input as below:

<a href="https://www.contoso.com"> You won the lottery <script language="javascript" > var a = document.cookie;
</script> </a>

Call one of the GetSafeHtmlFragment methods from overloaded list as below:

AntiXss.GetSafeHtmlFragment(stringReader, stringWriter);
//stringWriter will hold the output.

The output will be:

<div> <a href="https://www.contoso.com"> You won the lottery </a> </div>

As you can see the output is rendered harmless and valid.

It is worth nothing that this approach is different from HTMLEncoding. In encoding all unsafe characters are encoded to be rendered as harmless characters in the users browser. Using SafeHtmlFragment you actually purge the dangerous/unsafe script and replace it with white spaces.

More from me next week when we start to explore the next generation of Anti- XSS technology we are working on.