Skip to content

Validating, sanitizing, and escaping

When writing theme and plugin code, it is important to be mindful of how data coming into WordPress is handled and how it is presented to the end user. This is commonly needed when building a settings page for a theme, creating and manipulating shortcodes, or saving and rendering extra data associated with a post. There is a distinction between how input and output are managed.

For more comprehensive guidance and examples for validation, sanitizing, and escaping methods, review the WordPress Plugin Handbook’s Data Validation page.

For more information on why VIP takes these practices so seriously, read The Importance of Escaping All The Things, which discusses why escaping (sanitizing input and escaping output) is a critical aspect of web application security.

Guiding principles

  1. Never trust user input.
  2. Escape as late as possible.
  3. Escape everything from untrusted sources (e.g., databases and users), third-parties (e.g., Twitter), etc.
  4. Never assume anything.
  5. Sanitation is okay, but validation/rejection is better.

Validating: Checking user input

To validate is to ensure that the data requested from a user matches what they have submitted. There are several core methods useful for input validation. The best usage depends on the type of fields to be validated.

This example form includes a field for a zip code:

<input id="my-zipcode" type="text" maxlength="5" name="my-zipcode" />

The form limits the user to five characters of input, but no limitation to the type of characters of input. Both “11221” or “eval(” could be entered. If the input values will be saved to the database, the user should not have unrestricted write access.

This is where validation plays a role. To further limit the user’s input, code can check each field for its proper data type. If the input value is not the proper data type, it will be discarded.

This code example demonstrates a method for validating the my-zipcode field:

$safe_zipcode = intval( $_POST['my-zipcode'] );
if ( ! $safe_zipcode ) {
	$safe_zipcode = '';
}
update_post_meta( $post->ID, 'my_zipcode', $safe_zipcode );

The intval() function casts user input as an integer, and defaults to zero if the input is a non-numeric value. It then checks to see if the value ended up as zero. If it did, it will save an empty value to the database. Otherwise, it will save the properly validated zip code.

Additional steps can be taken to ensure the value is actually a valid zip code based on expected ranges and lengths. For example, the code example above would allow an input value “111111111” to be saved, even though it is not a valid zip code.

This style of validation most closely follows the WordPress safelist philosophy: Only allow the user to input what you’re expecting.

Sanitizing: Cleaning user input

Sanitization is a more liberal approach to accepting user data and is the best approach when there is a range of acceptable input.

Use the built-in sanitize_*() series of WordPress helper functions whenever possible.

For example, in a form field like this:

<input id="title" type="text" name="title" />

The data could be sanitized with the sanitize_text_field() function:

$title = sanitize_text_field( $_POST['title'] );
update_post_meta( $post->ID, 'title', $title );

The sanitize_text_field() function does the following:

  • Checks for invalid UTF-8
  • Converts single less-than characters (<) to entity
  • Strips all tags
  • Remove line breaks, tabs and extra whitespace
  • Strip octets

In some instances, using wp_kses and its related functions can clean HTML while keeping any relevant values depending on requirements.

Escaping: Securing output

Escaping handles security on the other end of the spectrum. To escape is to take the data you may already have and help secure it prior to rendering it for the end user. Use the built-in sanitize_*() series of WordPress helper functions whenever possible for securing output.

esc_html() should be used any time an HTML element encloses a section of data that is being output.

<h4><?php echo esc_html( $title ); ?></h4>

esc_url() should be used on all URLs, including those in the src and href attributes of an HTML element.

<img alt="" src="<?php echo esc_url( $media_url ); ?>" />

esc_js() is intended for inline Javascript.

<div onclick='<?php echo esc_js( $value ); ?>' />

esc_attr() can be used on everything else that is printed into an HTML element’s attribute.

<ul class="<?php echo esc_attr( $stored_class ); ?>">

wp_kses() can be used on everything that is expected to contain HTML. There are several variants of the main function, each featuring a different list of built-in defaults. A popular example is wp_kses_post(), which allows all markup normally permitted in posts. Custom filters can be created using wp_kses() directly.

<?php
echo wp_kses_post( $partial_html );
echo wp_kses(
	$another_partial_html,
	array(
		'a'      => array(
        	'href'  => array(),
        	'title' => array(),
    	),
    	'br'     => array(),
    	'em'     => array(),
    	'strong' => array(),
	)
); ?>

In this example, an array is passed to wp_kses() containing the member

'a' => array( 'href' , 'title', )

means that only those two HTML attributes will be allowed for a tags — all others will be stripped. Referencing a blank array from any given key means that no attributes are allowed for that element and they should all be stripped.

There has historically been a perception that wp_kses() is slow. While it is a bit slower than other escaping functions, the difference is minimal and does not have as much of an impact as most slow queries or uncached functions would. (For more information, read Zack Tollman’s wp_kses investigation.)

It is important to note that most WordPress functions properly prepare the data for output, and additional escaping is not needed.

Encode URL parameters

When constructing URLs with parameters, it is important to properly encode any dynamic parameter values.

rawurlencode() should be used over urlencode() to ensure URLs are correctly encoded. Only legacy systems should use urlencode(). This also applies when using add_query_arg() to construct URLs.

<?php echo esc_url( 'http://example.com/a/safe/url?parameter=' . rawurlencode( $stored_class ) ); ?>

Always escape late

It is best to do the output escaping as late as possible, ideally as data is being outputted.

// Okay, but not great.
$url = esc_url( $url );
$text = esc_html( $text );
echo '<a href="'. $url . '">' . $text . '</a>';

// Much better!
echo '<a href="'. esc_url( $url ) . '">' . esc_html( $text ) . '</a>';

It is better to escape late for a few reasons:

  • Code reviews and deploys can happen faster because it can be deemed safe for output at a glance, rather than hunting through many lines of code.
  • Something could inadvertently change the variable between when it was firstly cast and when it is outputted, introducing a potential vulnerability.
  • Late escaping makes it easier to do automatic code scanning, saving time and cutting down on review and deploy times.
  • Late escaping whenever possible makes the code more robust and future proof.
  • Escaping/casting on output removes any ambiguity and adds clarity (always develop for the maintainer).

Escape on string creation

It is sometimes not practical to escape late. In a few rare circumstances output cannot be passed to wp_kses, since by definition it would strip the scripts that are being generated.

In situations like this, always escape while creating the string and store the value in a variable that is a postfixed with _escaped, _safe or _clean (e.g., $variable becomes $variable_escaped or $variable_safe).

If a function cannot output internally and escape late, then it must always return “safe” HTML. This allows echo my_custom_script_code(); to be done without needing the script tag to be passed through a version of wp_kses that would allow such tags.

Last updated: September 20, 2022