Good Practices: how to sanitize, validate and escape in PHP [3 methods]
Introduction
I recently realized that even in places that are considered to be top-notch and where there is a lot of competition among tech agency and business such as London, there are still plenty companies that use the fifth or previous versions of PHP and, even worse, do not use good practices at all.
Convections that should be used on a daily basis like sanitize input, validate data or escape output are overlooked for reasons like: “there is no time, I need to deploy today, or the boss pushes me”.
From skipping to check external data to fail to manage passwords and dates until decide to not handle errors and exception, all of these lead to bad code and insecure applications.
But today, you will see how to work with data and why it is important.
In fact,
this post contains good practices and bits of advice that can be used every day in all of the projects you are currently working on.
If you follow good practices you are making sure that your applications are running faster, are more stable and they will be really difficult to break because they run in a more secure manner.
PHP as a language has lasted now for more than 20 years and it had several rewriting and evolutions with means that all the tools that have been created, updated and deprecated made so that there are outdated tools from the past and it is very easy to fall into traps and deploy bad code.
A piece of good advice I had several years ago was to know what tool to use and which ones need to be ignored.
The PHP way to write good practices
David Duchovny as Fox William Mulder a fictional character in a series of science fiction-supernatural television called The X-Files used to say the sentence “ trust no one ”.
As a PHP developer, you need to keep this philosophy in mind all the time.
You do not know who are the visitors of your web pages, and even worse, you do not know what their intention is.
A way to retain control of our application is to add additional layers of protection to our application by checking and limiting access to our external sources.
Tipical external source can be:
- $_POST
- $_GET
- $_REQUEST
- $_COOKIE
- file_get_contents()
- Databases
- APIs
- Input data from the clients
- $argv
- php://stdin
- php://input
All of the keywords above can be resources that ill-intentioned can use to inject malicious data into your script.
If you are using any one of these keywords in your application, it might be worth that you schedule some time during the week to sanitize your input, validate your data and escape the input
In the following paragraphs, I will teach you how to do that.
Sanitize Input
From Cambridge Official Dictionary
/ˈsanɪtʌɪz/ - to make something completely clean and free from bacteria
In web development to sanitize means that you remove unsafe characters from the input.
This is your first line of defence,
Is fundamental that you sanitize the input that our application receives before it reaches the storage layer (whether you are using a MySql or NoSql database or using cache applications like Redis).
As we said previously, we need to suppose that the world is a dangerous place, which means that we need to be sure that if a malefactor came to our page he or she should not be able to do anything dangerous or injecting some bad data.
There are several types of input you need to consider when sanitizing the most common are HTML, input via SQL queries and user profile information.
Let’s have a look at all the three cases and what we can do to solve these problems.
Input via HTML
Imagine you have a blog,
your blog allows comment and, some random guy from the web, after reading your blog post decide to type a comment that includes a very basic JavaScript one-line-script,
Something like the window.location.href command.
The outcome of this is that anytime someone (after our friend left his present) clicks to the post in our page, he or she is going to be redirected to who know what pages the attacker decided to send them.
A way to solve it is by using the PHP command htmlentities().
This function escapes all HTML characters in a string and renders the string safe.
The problem with htmlentities() is that it is not very powerful, in fact, it does not escape single quotes, cannot detect the character set and does not validate HTML as well.
To solve this problem we need to learn how to best use its arguments.
The first argument that this function accepts is the string we need to sanitize (duh?).
The second one must include a flag, in our case we want to use the ENT_QUOTES constant,
It prompts the function to encode single quotes.
Eventually, the last argument allows you to specify the character set you are using in your application.
A basic example would look like this:
echo htmlentities($string, ENT_QUOTES, 'UTF-8');
A tool that you can use to sanitize your HTML with more deepens is called HTML Purifier.
It is a library,
it accepts a series of parameters that you can set beforehand and it is considered to be very reliable.
The last thing I want to say about this topic is to don’t use regular expression functions for sanitizing , they are very complicated and the risk of error is very high.
Avoid preg_replace(), preg_replace_all(), preg_replace_callback()
SQL queries
Sometimes we, as developers need to build SQL queries according to the input a user gives us.
This input can come from a query string (eg: ?user=1) or a URI (eg: user/1).
If you are not careful with this input and allow them to directly be inserted into a query it may be very dangerous for your application.
Let’s make a basic example:
$changePassword = sprintf(
'UPDATE users set password = "%s" WHERE id = "%s"',
$_POST['password'],
$_GET['id'],
);
What is wrong with this code?
Do not forget to rule n.1 “trust no one”.
The level of protection to this code is very weak, actually, it is non-existent at all.
What will happen if someone sends an HTTP request to your PHP script?
Something like:
POST /user?id=1
password="abc";--
Many SQL databases consider -- to be the beginning of a comment causing the text that follows to be ignored.
The results?
You would set all the users’ password to abc.
What can you do to solve the problem?
Use PDO prepared statement.
PDO is a database abstraction layer.
It was built into PHP and provide an interface that allows the use of several databases.
PDO sanitize and embeds external data into a SQL query in a safe way and avoid the type of problems I have shown above.
User Information:
The majority of web application available on the web right now use an accounting system of some type, think about that for a moment, your social network of choice, an online newspaper that filter the news more suitable for your account or your Netflix account that will suggest the best series to watch according to your previous preferences.
It is very likely that all these accounts have email addresses, telephone numbers, your location etc saved onto them.
The developers of core PHP have done a wonderful job at foreseeing these situations and provided us with two functions.
The first one is filter_var() the second is filter_input().
These two functions sanitize the inputs by using an assortment of flags.
An example very easy to understand is when you need to sanitize an email
$email = 'myname@gmail.com';
$emailSanitized = filter_var($email, FILTER_SANITIZE_EMAIL);
This function, when using the flag in the example, is making sure that the code removes all characters except letters, digits and the following characters !#$%&'*+-=?^_`{|}~@.[] .
Validate Data
From Cambridge Official Dictionary
/ˈvæl.ɪ.deɪt/ - to make something officially acceptable or approved, especially after examining it
You have sanitized the inputs, removed all the information that you did not want to use and discarded all the data that you considered dangerous.
Now is time to validate the data.
Validation is not sanitization, this step does not remove any bad data, validation confirms that the info that is coming to your application meets the criteria you want.
If you are waiting for a text value, let’s stick with the example of the email, you need to be sure that your application will receive a string containing an email, same for date and numbers.
If this step is overlooked it may lead to several errors in the next steps and more problems in a more advanced stage of your user’s journey.
There are different methods to validate an input but the main function used to validate is, once again, filter_var().
We have seen how by using a flag such as FILTER_SANITIZE_EMAIL characters not supported on an email will be removed,
Now we can use this function with a similar flag FILTER_VALIDATE_EMAIL.
This function returns a variable of two different types depending on the case.
If the validation has occurred successfully it will return the value itself, in the opposite case, it will return false.
For this reason, there is a specified way to check the outcome of this function.
$email = 'myname@gmail.com';
$EmailValidation = filter_var($email, FILTER_VALIDATE_EMAIL);
If ($EmailValidation !== false) {
echo "email validated successfully";
}
As per the sanitize function, the majority if not all the flags start with the string FILTER_VALIDATE_* and end up using the type of validation we need to use,
We can validate integer, float, IP addresses, domains, URL and so on.
Look up the validate filter page for more
Note that, even if filter_var() provides several validation flags there are components much more powerful such as:
Escape Output
From Cambridge Official Dictionary
/ɪˈskeɪp/ - to get free from something, or to avoid something
We have gotten some data, we have validated it using the techniques you have just learned, now it is time to think about how to make secure the displaying phase of this data.
We can add another layer of protection to our application by escaping the information we want to show and preclude some code to be showed and, even worse, executed on our pages.
To escape the output we use, the PHP function htmlentities().
We need to be sure that the second parameter of this function includes the flag ENT_QUOTES so it escapes single and double quotes, and eventually define the character encoding (in Europe and the US it is usually UTF-8).
Pay attention to not escape the data more than once, you must escape only when you received it or when you need to output it.
Here is an example:
$script = '
< script > alert ( "This is a message" ); < / script >
';
echo htmlentities($script, ENT_QUOTES, 'UTF-8');
As per the validation of the data, there are several components that can be used even during the escaping phase.
The most relevant are:
Conclusion
Being able to manage data properly, validate it, show it in a secure and reliable way and make your web application trustworthy must be one of your main goals from the beginning of your career as a web developer.
There are several other good practices like learning to use password properly, or using date and times, or even manage the PDO extension
(we'll review that as well so be sure to subscribe to the newsletter below).
All these conventions may seem boring and a waste of time but in the long run the benefit of using them will become clear.
Plus, as a side note, at the moment there are dozens of PHP frameworks that make this process very easy and smooth,
I have reviewed 24 PHP frameworks that can help you make this job easy.
Review the code on your project right now and be sure to always use good practices.