Data Validation—Deny-list or Approve-list approach?
I think by now we all know that all data input from a Web UI should be considered evil until validated. We also know that data validation performed strictly on the client is not really there for security but rather better responsiveness to the End-Users when they are entering data. If you only use client side validation then it is as simple as using writing a perl script that calls the web server directly and passed the data via a query string parameter, thus avoiding client side validation. Or it could be as simple as turning java script off on the browser.
Therefore, all data validation must also be done on the server side as well, and depending upon the platform you are using, regular expression seems to be the most common mechanism for server side validation. Now the next questions becomes what approach do I use to validate data?
Neither approach is going to be perfect but the Approve-list approach has fewer downside then the deny-list approach and usually it fails on the side of caution.
In the deny-list approach you want to create an expression that will validate the incoming data against all known characters that are considered dangerous. The first mistake with this approach is do we really know what are all the dangerous characters that are out there and all the different ways these characters can be represented (canonical representation) such as Hex, double encoding, Unicode and other written languages. As you can see the list would become extremely long and cumbersome to support, needless to say the decreased performance for validating all the data. The second major problem is with this approach is the word “known”. You may know all the known dangerous characters today but this list will eventually be incomplete tomorrow, next week or next year as new hackers find new ways of representing characters to get by our list. If you do not keep the deny-list up-to-date then you may find out the hard way that your list is incomplete.
That is why it is better to go with the Approve-list approach. With this approach you will look for valid data and reject everything else. This list is shorter and easier to maintain. This also reduces the chances of getting hit with canonical representation attacks. If data is entered that is in fact valid but was not adequately represented in your list then your application could reject valid data, causing users to be annoyed. However, I would rather fail on the side of caution and have an annoyed user then to find out my deny list was incomplete and I was hacked with invalid data. Therefore, with the Approve-list approach make sure you spend the necessary time early in the design stage to create a list of valid data.
Therefore, with in .NET and other platforms you may be using do not start searching for regular expression that contains a list of all known dangerous characters as you will not find and for good reasons.