Input validation in web applications

Input validation or data validation is a technique to test and verify all incoming data to make sure that it is in the right format. All applications should treat user input as untrusted by default and make sure that only the right kind of information is entered.

In the case of web applications, input validation should always happen on the server-side because anything running on the client computer (for example Javascript) can be altered so client-side validated data shouldn’t be trusted. It improves the user experience to validate on the client-side as well, but it does not improve security.

Blacklisting vs. whitelisting

Blacklisting is a method where the program is looking for possible bad input and detects incorrect data by checking for possible errors. On the other hand, whitelisting only allows data that is known to be good and anything else is rejected. Whitelisting is a much better way because no developer is capable of understanding all the possible vulnerabilities or bad code so it’s safer to simply list and allow only the good ones.

For example, when validating a number, whitelisting would be the removal or detection of anything BUT a number, blacklisting would be removal of all letters, punctuation, etc. However, in the case of blacklisting, if an unknown extra character is entered but not listed as blacklisted, they would be passed as a number.

Improperly validated input opens the web application to all sorts of vulnerabilities – let’s look at different classes of them to understand how they work and how to protect against them.

Output validation

In addition to input validation, it’s good practice to always validate and properly escape any information that gets sent to the client-side (e.g. displayed on a webpage). This way it adds a second layer of security and even if somehow corrupted data makes it to the server-side, the damage it can cause is limited.

An example of this would be storing only validated usernames and still escaping any special characters when displaying them. Different use cases (for example when sent in e-mail, used in HTTP headers, displayed on webpages, etc.) may have different escaping requirements.

SQL Injections

SQL injection is a type of attack where database queries containing user data are exploited by sending specially crafted input to alter the query to either read more information or overwrite parts of the database.

In addition to input validation, they’re best handled by avoiding directly embedding user input in SQL queries but using placeholders or bind parameters.

For example, let’s assume that we have two input variables, $username and $password, containing login credentials by running an SQL select:

select * from users where username='$username' and password='$password';

If the password contains “testing’ or 1; –“, the query becomes:

select * from users where username='test' and password='testing' or 1; --';

There are multiple ways to fix this, for example

validate user input and disallow special characters (blacklisting)
validate user input and only allow letters / numbers (whitelisting)
quotes in $password should be escaped properly
placeholders should be used instead of inline injection of user input, e.g. “select * from users where username=? and password=?;”

+1 – Passwords should NEVER be stored as clear text in a database to avoid theft and credential stuffing attacks. They should be randomly salted and hashed preferably with an algorithm that is slow enough to deter brute-force attacks.

XSS attacks

XSS or cross-site scripting attacks exploit web applications when user data is displayed on a webpage. For example, if a webpage asks our name then displays “Hello $name” on the webpage unvalidated, the name can contain special characters to run malicious code (HTML or Javascript) on behalf of the application.

For example, let’s assume that badwebsite123.com has a page that displays the user in the HTTP GET parameter, so badwebsite.com/?user=techtipbits would display “hello techtipbits!”. The same website allows users to log in and store sessions in a cookie.

Now if we create a link badwebsite.com/?user=<script>dosomethingevil()</script>, the following things happen:

the page displays “hello <script>dosomethingevil()</script>”
the <script> part executes and dosomethingevil() is ran
bad things happen

This is called a reflected XSS attack because the website instantly displays whatever is entered by the user and that triggers the attack.

If the username is stored let’s say in a database, it can also attack whoever sees it later, making it a stored XSS attack. Imagine that an administrator is logged in to this website and this badly validated username gets displayed. It will run code on behalf of the administrator with possibly damaging consequences.

There are multiple ways to protect against this

validate the username input, disallow special characters (blacklisting)
validate the username, allow letters/numbers only (whitelisting, better)
always properly escape any user input when displayed
set up an X-XSS-PROTECTION header (below)

To protect against unknown XSS exploits, it’s now possible to set a special header called “X-XSS-PROTECTION”. It’s not bulletproof but it can help to mitigate damage. Adding the following header to all pages (using server-side code or even from .htaccess) it will try to block any XSS attempt:

X-XSS-Protection: 1; mode=block

HTTP Header injection attacks

Similarly to XSS, user input can also be used to generate HTTP headers. If the attacker can also inject newlines into the data that’s used to generate the header, they can force the web application to generate new headers on its behalf.

For example if the input is “username=hello\nLocation: evilwebsite123.com”, and the website generates a header

X-Auth-User: $username

it suddenly becomes:

X-Auth-User: hello
Location: https://evilwebsite123.com

And the website suddenly redirects to evilwebsite.com instead of displaying whatever page it’s meant to be displayed. It works similarly to XSS but instead of injecting code into the displayed webpage, it injects it into the HTTP response.

Protection is once again filtering the username and properly escaping data when sent to the client-side.

Web Application Firewalls

As an extra layer of defense, many platforms provide a web application firewall (WAF) that detects most known SQL injection or XSS exploits and tries to protect the application from them.

The Apache project includes a module called “Mod Security” that provides WAF capabilities as a loadable module. You can find more information about them here: https://modsecurity.org

There is a well-known WordPress plugin called WordFence that functions as a web application firewall – the basic edition is free but there is a paid version with extended features. https://www.wordfence.com

Plenty of other software solutions exist for the same purpose, I’m listing a few of these here for reference:

Sucuri: https://sucuri.ne
Cloudflare has a WAF module: https://developers.cloudflare.com/waf/about
Amazon WAF: https://aws.amazon.com/waf/
Barracuda WAF: https://www.barracuda.com/products/webapplicationfirewall