Lockdown Pt 2– Sanitization and Data Security

Tim Kulp | July 13, 2011

Sanitization and Data Security

In the first article, we examined Risk Management and started addressing the low hanging fruit of securing an application, Input Validation. We will continue to address simple solutions for improving our application’s security in this article as we examine Data Sanitization and Data Security. At the end of this article you will be able to clean data coming in to your system and secure data being stored using encryption.

Data Sanitization

Data Sanitization is an extension of Input Validation in that we are going to make sure that only approved values are being processed by our system. HTML presents a challenge in that data and code are intermingled. This intermixing gives rise to many attacks the most popular of which, Cross Site Scripting (XSS). Cross Site Scripting attacks occur when data is interpreted as code and processed by the Browser (read more about XSS attacks at http://msdn.microsoft.com/en-us/scriptjunkie/hh243615.aspx). Data Sanitization aims at mitigating XSS attacks by cleaning the code out of your data.

I use server-side checks for this, I’m okay…

PHP and ASP.NET have excellent tools that you can use to check for XSS attacks among other malicious input. Tools like strip_tags in PHP and the AntiXSS toolkit in ASP.NET can make the process of cleaning malicious script tags out of your data simple on the server side. If you can affect the server side code then absolutely have server side checks, but remember that not all attacks happen through the server. DOM Based XSS attacks occur on the client side without server interaction and if your site uses Ajax to call data from a third party, that data could go directly to the client side. As a best practice, do not delegate data sanitization to the server side and forget about checking the data on the client side. Sanitize your data on the client and server side to be thorough.


Attackers have many different methods to deliver their malicious payloads to your application. I once worked with a development team that checked all incoming data fields for “<script>” to validate that the incoming data did not include JavaScript. Unfortunately this did not catch %3Cscript%3E. While their heart was in the right place they failed to follow the first rule of data sanitization: Convert your data to a Canonical form.

In web development, this is fairly difficult because we have numerous ways of representing characters such as URL Encoding, HEX, HTML encoding, etc… An attacker could encode their commands to avoid detection or use multiple encodings to throw off checks for a single encoding. Canonicalization issues start when you only check a single representation of the data and make security decisions on what you find. For example, if you check data coming in to your system for tags that look like: “<script>”, do not find it and then declare the code safe you would have missed %3Cscript%3E or &lt;script&gt; both of which are valid script tags depending on the data’s encoding.

Exercise #3: Canonicalize and Scrub

In this exercise, we are going to sanitize data coming in from an Ajax call on the Time Manager project we started with in the first article (you can grab the project here). If you remember back to our Risk Assessment, we identified the hour logging process as something we need to secure. An XSS attack in this process could compromise our contractor’s data. To secure this section we need to sanitize all data coming in through the various Ajax calls on the page. Notice that we followed our Risk Assessment to illustrate the threat (XSS), the impact on the business asset (compromise of data) and then our recommended solution (sanitization of data).

On the timeManager.php page our legacy application is displaying an alert from the Accounting department to the user. This feature is used by Accounting to alert Contractors about issues with their hours, vacation schedules, etc… Here is the Ajax code:

function checkForAlerts() {


            url: "alert_data.php",

            data: "username=<?php echo $un ?>",

            success: function (data) {

                var failed = false;

                if (data != null) {

                    var jdata = $.parseJSON(data);

                    $(jdata).each(function (i, item) {

                        var strOut = "" + item.EntryDate + "<br/>From: " + item.From + "<br/>To: " + item.To + "<br/>" + item.Message + "";





            error: function (result) {

                alert("An error occurred");




While regular expressions can be used to permit only specific data and formats we will take a different approach for the message field.

There are numerous data sanitization libraries available for JavaScript such as ESAPI4JS from OWASP. This library has a robust set of objects and functions to clean your data but can be a bit difficult to setup and navigate. For this article we are going to use Chris Schmidt’s jquery-encoder. This library is much more light weight than ESAPI4JS and leverages the jquery plugin setup that many web developers are familiar with. Using jquery-encoder we can take our data to a canonical form and encode it to various formats including HTML, HTML Attribute, CSS, JavaScript and for URLs.

We will setup our Ajax call to first canonicalize the item.Message variable and then encode the output for HTML. Here is the resulting JavaScript code:

$(jdata).each(function (i, item) {

                        try {

                            var tainted_message = item.Message;

                            var can_message = $.encoder.canonicalize(tainted_message);

                            var strOut = "" + item.EntryDate + "<br/>From: " + item.From + "<br/>To: " + item.To + "<br/>" + $.encoder.encodeForHTML(can_message) + "";



                        catch (ex) {

                            alert("An error occurred while attempting to display the message");



First, building the data string with a try/catch block. This will catch any errors that occur while making your data canonical or while encoding it. Next we assign the item.Message value to our tainted_message variable so that we do not confuse sanitized data with raw, possibly tainted data. Next we run the $.encoder.canonicalize method on tainted_message which will check the data against a series of codecs to discover any encodings or if multiple encodings were used. In the case of multiple encoding, the plugin throws an exception. Like tainted_, for clarity we store the canonical form in a variable prefixed by can_ (can_message) to denote that the message data has been canonicalized. Finally, we encode the canonicalized data using $.encoder.encodeForHTML(can_message). This method displays can_message as content to the browser, not code to be interpreted or executed.

Collecting all the Low Hanging Fruit

By using Input Validation and Data Sanitization techniques, you can easily and quickly start building a security wrapper around your legacy application. Before you begin to secure your legacy applications, watch how data flows through your system. Use this map as a checklist of where you need to secure data throughout your system. Validate that only known good data comes in to the system and when you are not sure what “good data” is, make sure you sanitize the data to keep data and code separate.

Topic 3: Secure Storage

As with a house, you can lock all the doors and windows but sometimes you still need to put sensitive documents in a safe. Data is the same, as we are locking down how data flows in and out of our application we need to make sure that our data stores are secured as well. In web development that use to mean securing the Relational Database Management System (RDBMS like MySQL, SQL Server, Oracle, etc…) but now we have more data stores to watch such as HTML5 Local Storage. We will examine some techniques to secure our database on the server side as well as some JavaScript encryption techniques (hold your disbelief).

SQL Injection

A major threat to any database system is SQL Injection. This attack uses malformed SQL statements to break out of calls to your database and perform some malicious action. The standard demonstration of a SQL Injection is the following:

$strSQL = "SELECT id, username, password, department FROM tblUsers WHERE username = '” . $_REQUEST[‘txtUn’] . “' AND password = '” . $_REQUEST[‘txtPw’] . “';";

Imagine if txtUn = “‘ OR 1=1;--“. This would result in the following SQL statement:

SELECT id, username, password, department FROM tblUsers WHERE username = ‘’ OR 1=1;

This will return all entries from the tblUsers back to the application. Fortunately SQL Injection is easily fixed with a basic security concept and a bit of reworking to the queries.

Least Privilege

I have seen many instances of connecting to a database as “sa” or “root” to save time in allocating the necessary rights to various users. Having administrator privileges on the database opens your application up to statements like DROP, CREATE or BACKUP (to a remote location). When connecting to your database, the user should have the minimal rights necessary to interact with the application. If this sounds like commonsense to you, great keep doing what you do. If you are one who indulges in the short cuts of everyone is an admin, stop. Do not setup any more applications to run as Admin and take the time to create the specific rights for your users. While least privilege will not prevent SQL Injection attacks, it will drastically limit the amount of damage an attacker can do to your data.

Parameterized Queries

SQL Injection works by escaping one SQL statement or manipulating that statement to perform actions that the developers did not intend. To protect against this threat we need to build our SQL statements in a manner that they cannot be so easily escaped. Parameterized Queries use placeholders to fill variables into a SQL string and use the database to create the query statement instead of string concatenation.

Depending on your server side development platform, how you create a Parameterized Query will be different. For PHP you can use the prepare statement to build your parameters. For ASP.NET, developers can use DbParameter (or SqlParameter). Both techniques allow you to create a placeholder in the SQL statement and then assign a value and data type to that placeholder. Some techniques even allow you to specify the length of the input as well. This allows you to specify that txtUn (from our previous example) will be a varchar with an 8 character length. This specificity reduces the attack surface in your application.


Encryption is the process of reorganizing and replacing data so that it is not stored in a manner that is accessible to anyone who does not possess the encryption key. Encryption involves three elements:

  • Plaintext: the data that is to be encrypted
  • Algorithm: the mathematical process that will alter the plaintext into the ciphertext
  • Key: the secret that is used to encrypt  the plaintext and decrypt the ciphertext

Encryption is used to preserve the confidentiality of the data. If you need to keep something secret, chances are it is a candidate for encryption. If you are not sure what needs to be kept secret, refer to your risk assessment. The business will identify assets that it believes are important to keep secret. In the case of our sample project, the business has identified Username and Password information to be sensitive data points making them prime candidates for encryption.

Dismissing a common encryption myth

Sometimes developers cook up their own encryption schemes and believe that these encryption methods are secure because no one knows how they work. Unfortunately, security through obscurity is not a viable defense against attack. If there are weaknesses in the encryption scheme, they will be discovered under examination and attack. Algorithms like AES and a variety of other encryption algorithms have been publicly examined by mathematical geniuses and the Information Security community to ensure that the encrypted data can only be extracted via the key. When selecting an algorithm, stick with one that has been publicly scrutinized such as AES or RSA.

Encrypt or Hash?

In general encryption is a two way street. You can encrypt plain-text to cipher-text using an algorithm and secret key as well as decrypt using the same process. Sometimes though, you do not need to decrypt back to plain-text and only need to compare the values of the cipher-text to see if they match. One way encryption is called Cryptographic Hashing; a one way encryption that results in a fixed length cipher-text. Using the same settings (algorithm and seed), a plain-text statement will always result in the same cipher-text. Using Cryptographic Hashing is common for password storage where the developer does not need to return the actual password text but can simply compare the resulting hash value of the entered password against the stored password. If the hashes match, then the password entered is (probably) equal to the password stored and it can be assumed the user entered the correct password.


When you have a fixed length output, you will inevitably duplicate values. This is called a Hash Collision; when two plain-text inputs that are different in value result in the same hash value. The likelihood of a collision depends on the algorithm that you use to calculate the hash value. Make sure that you use a collision resistant algorithm such as SHA-2 and avoid algorithms such as MD5. Finally, remember if you have more bits going in to the hash than coming out, a collision is more likely. This might sound like common sense but it is often overlooked.

JavaScript based encryption…no, seriously

A few years ago when I first heard of JavaScript based encryption I laughed it off. As I have talked to other developers about the idea of encryption on the client side most respond in the same way that I did. Everything is happening on the browser so an attacker can easily reverse engineer whatever happens right? A few cryptography courses later and the following statement was drilled into my head: the key is the secret not the algorithm. As long as the key is secret and you are using a solid algorithm, encryption can work in JavaScript. The main concern with JavaScript encryption is transmission of the key, how do you get the secret that will encrypt the data to the browser?

Many developers have tackled both symmetric (one key for encryption/decryption) and asymmetric (public/private key encryption/decryption) encryption with JavaScript but the one that I have found most useful and easiest to implement is jCryption by Daniel Griesser. This library is a jQuery plugin which many web developers are familiar with and works with PHP server side code to deliver the keys for the encryption process. Using jCryption form values can be encrypted before leaving the browser which can prevent exposure during communication and from processing on the server.

Remember, not all users will have JavaScript enabled when coming to your application. For those users, server side encryption will be a must. As developers we need to leverage client side security when we can but realize that attackers can bypass our controls. Using the concepts of Defense in Depth we can catch malicious users trying to bypass our client side checks with server side code.

Exercise #4: Encrypting passwords

From our Risk Assessment we know that we need to secure password information. There are a variety of ways that this can be done. We could encrypt the password using PHP or MySQL but for this article we will use jCryption on the client side to encrypt the password before it even leaves the browser. One thing to keep in mind, jCryption requires PHP for key delivery. If you are not using PHP on the server side there are many other JavaScript encryption libraries online.

The first step in setting up jCryption is downloading the library from http://www.jcryption.org/. There are many samples with the library to help you get started. The key code we will be using can also be found in the string example (jCryption-1.2.zip\jCryption-1.2\examples\string). With the library loaded, add a new PHP file to your site called “encrypt.php” with following code:




    $keyLength = 1024;

    $jCryption = new jCryption();

    if(isset($_GET["generateKeypair"])) {


        $keys = $arrKeys[mt_rand(0,100)];

        $_SESSION["e"] = array("int" => $keys["e"], "hex" => $jCryption->dec2string($keys["e"],16));

        $_SESSION["d"] = array("int" => $keys["d"], "hex" => $jCryption->dec2string($keys["d"],16));

        $_SESSION["n"] = array("int" => $keys["n"], "hex" => $jCryption->dec2string($keys["n"],16));

        echo '{"e":"'.$_SESSION["e"]["hex"].'","n":"'.$_SESSION["n"]["hex"].'","maxdigits":"'.intval($keyLength*2/16+3).'"}';

    } else {

        $var = $jCryption->decrypt($_POST['jCryption'], $_SESSION["d"]["int"], $_SESSION["n"]["int"]);

        echo urldecode($var);



This PHP code will be called by the jCryption JavaScript library to get the keys for encryption. As you look through the code, notice the reference to two PHP files: jcryption.php and 100_1024_keys.inc.php. jcryption.php is the jCryption object that PHP will use to work with the encrypted data. The 100_1024_keys.inc.php file is a helper file for building keys. Make sure that encrypt.php, jcryption.php and 100_1024_keys.inc.php are all in the same directory. For this project I placed all of them in the includes folder.

With encrypt.php created, now open default.php and add a reference to the jCryption JavaScript library:

<script language="javascript" type="text/javascript" src="scripts/jquery.jcryption.js"></script>

Add this reference after your call to the jQuery library as the jCryption library is a jQuery plugin. Now we are ready to generate the keys. Add a global variable to your SCRIPT block called “keys”:

var keys;

This will be the variable that will hold the keys from the encrypt.php page. To retrieve the keys from the server, add the following code block:


        //get the keys

        var keyGenPath = "includes/encrypt.php?generateKeypair=true";

        $.jCryption.getKeys(keyGenPath, function(receivedKeys){

            keys = receivedKeys;



Using jQuery's document ready function we set keyGenPath to a string pointing to the encrypt.php file. Notice at the end of the string we use a querystring parameter to signify that we are retrieving Keys from the PHP file. The next line of code uses the path set in keyGenPath to make an Ajax call and collect the keys. $.jCryption.getKeys(url, callback) uses the URL provided to make a call to the server using the jQuery $.getJSON method. The callback is executed with the keys passed through the receivedKeys object. Finally we set the global variable keys to the value of receivedKeys.

Now that we have the keys, we can encrypt the password value in the form. In the first article we created validation code to make sure the password conformed to our password rules. Now we are going to extend that validation code:

if(un != null && pw != null) {      

            //encrypt the password

            var cipherText;

            $.jCryption.encrypt(pw.toString(), keys, function(encrypted){

                cipherText = encrypted;



                return true;



After validating the the un and pw variables are not null, we call the $.jCryption.encrypt method to encrypt the password value using the keys and a callback when successful. The callback sets the value of the txtPW field to be the encrypted version of the string.

The last step, Verification

Next article we will wrap up the Lockdown series by examining how we can validate that our security controls are working. Will use a few Firefox plugins to check our client side security and discuss how to do a security code review with your fellow developers.

Learn more about encryption

A complete examination of Symmetric vs. Asymmetric encryption techniques and algorithms is far beyond the scope of this article. For more information about the details of encryption techniques please check out:


About the Author

Tim Kulp leads the development team at FrontierMEDEX in Baltimore, Maryland. You can find Tim on his blog or the Twitter feed @seccode, where he talks code, security and the Baltimore foodie scene.

Find Tim on: