Skip to content

Validating, sanitizing, and escaping

When writing theme and plugin code, it is important to be mindful of how data coming into WordPress is handled and how it is presented to the end user. This is commonly needed when building a settings page for a theme, creating and manipulating shortcodes, or saving and rendering extra data associated with a post. There is a distinction between how input and output are managed.

$_GET$_POST$_REQUEST$_SERVER and other data from untrusted sources (including values from the database such as post meta and options) need to be validated and sanitized as early as possible (e.g. when assigning a $_POST value to a local variable) and escaped as late as possible on output.

Guiding principles

  1. Never trust user input.
  2. Never assume anything.
  3. Sanitization is okay, but validation/rejection is better.
  4. Validation on the client-side is for the user’s benefit; validation/sanitization on the server-side is always needed.
  5. Nonces should be used to validate all form submissions.
  6. Capability checks need to validate that users can take the requested actions.
  7. Escape everything from untrusted sources (e.g., databases, users), third-parties (e.g., Slack, Salesforce), etc.
  8. Escape as late as possible.

Validating: Checking user input

To validate is to ensure that the data requested from a user matches what they have submitted. There are several core methods useful for input validation. The best usage depends on the type of fields to be validated.

This example form includes a field for a ZIP Code:

<input id="my-zip-code" type="text" maxlength="5" name="my-zip-code" />

The form limits the user to five characters of input, but no limitation to the type of characters of input. Both 11221 or eval( could be entered. If the input values will be saved to the database, the user should not have unrestricted write access.

This is where validation plays a role. To further limit the user’s input, code can check each field for its proper data type. If the input value is not the proper data type, it will be discarded.

This code example demonstrates a method for validating the my-zipcode field:

$safe_zip_code = absint( $_POST['my-zip-code'] );
if ( strlen( $safe_zip_code ) !== 5 ) {
	$safe_zip_code = '';
}
update_post_meta( $post->ID, 'my_zip_code', $safe_zip_code );

The absint() function converts a value to non-negative integer, and defaults to zero if the input is a non-numeric value. It then checks to see if the value ended up as five characters. If it did, it will save an empty value to the database. Otherwise, it will save the properly validated ZIP Code.

Additional steps can be taken to ensure the value is actually a valid ZIP Code based on expected ranges and lengths. For example, the code example above would allow an input value 3.1e4 to be saved, even though 31000 is not a valid ZIP Code.

This style of validation most closely follows the WordPress safelist philosophy: Only allow the user to input the types of values that the field is intended for.

Sanitizing: Cleaning user input

Sanitization is a more liberal approach to accepting user data and is the best approach when there is a range of acceptable input.

Use the built-in sanitize_*() series of WordPress helper functions whenever possible.

For example, in a form field like this:

<input id="title" type="text" name="title" />

The data could be sanitized with the sanitize_text_field() function:

$title = sanitize_text_field( wp_unslash( $_POST['title'] ?? '' ) );
update_post_meta( $post->ID, 'title', $title );

The sanitize_text_field() function does the following:

  • Checks for invalid UTF-8
  • Converts single less-than characters (< ) to entities
  • Strips all tags
  • Removes line breaks, tabs, and extra whitespace
  • Strips percent-encoded characters

In some instances, using wp_kses and its related functions can clean HTML while keeping any relevant values depending on requirements.

Escaping: Securing output

Escaping handles security on the other end of the spectrum. To escape is to take the data you may already have and help contextually secure it before rendering it for the end user. Use the escaping functions, like the  esc_*()helper functions or wp_kses() and wp_kses_post(), for securing output.

esc_html()

esc_html() should be used any time a value is to be cleaned and output to HTML.

<span><?php echo esc_html( $description ); ?></span>

esc_url()

esc_url() should be used on all URLs, including those in the src and href attributes of an HTML element.

<img alt="" src="<?php echo esc_url( $media_url ); ?>" />

esc_js()

esc_js() is intended for inline Javascript.

<div onclick='<?php echo esc_js( $value ); ?>' />

esc_attr()

esc_attr() can be used on everything else that is printed into an HTML element’s attribute.

<ul class="<?php echo esc_attr( $stored_class ); ?>">

wp_kses()

wp_kses() can be used on everything that is expected to contain HTML. There are several variants of the main function, each featuring a different list of built-in defaults. A popular example is wp_kses_post(), which allows all markup normally permitted in posts. Custom filters can be created using wp_kses() directly.

<?php
echo wp_kses_post( $partial_html );
echo wp_kses(
	$another_partial_html,
	array(
		'a'      => array(
        	'href'  => array(),
        	'title' => array(),
    	),
    	'br'     => array(),
    	'em'     => array(),
    	'strong' => array(),
	)
); ?>

In this example, an array is passed to wp_kses() containing the member

'a' => array( 'href' , 'title', )

means that only those two HTML attributes will be allowed for a tags — all others will be stripped. Referencing a blank array from any given key means that no attributes are allowed for that element and they should all be stripped.

There has historically been a perception that wp_kses() is slow. While it is a bit slower than other escaping functions, the difference is minimal and does not have as much of an impact as most slow queries or uncached functions would. (For more information, read Zack Tollman’s wp_kses investigation.)

It is important to note that some WordPress functions properly prepare the data for output, and additional escaping is not needed.

Encode URL parameters

When constructing URLs with parameters, it is important to properly encode any dynamic parameter values.

rawurlencode() should be used over urlencode() to ensure URLs are correctly encoded. Only legacy systems should use urlencode(). This also applies when using add_query_arg() to construct URLs.

<?php echo esc_url( 'http://example.com/a/safe/url?parameter=' . rawurlencode( $stored_class ) ); ?>

Always escape late

It is best to do the output escaping as late as possible, ideally as data is being outputted.

It is better to escape late for a few reasons:

  • Code reviews and deploys can happen faster because it can be deemed safe for output at a glance, rather than hunting through many lines of code.
  • Something could inadvertently change the variable between when it was firstly cast and when it is outputted, introducing a potential vulnerability.
  • Late escaping makes it easier to do automatic code scanning, saving time and cutting down on review and deploy times.
  • Late escaping whenever possible makes the code more robust and future proof.
  • Escaping/casting on output removes any ambiguity and adds clarity (always develop for the maintainer).

In this example, the value of $url and $text is escaped earlier in the code, requiring effort to confirm that the escaping took place:

$url = esc_url( $url );
$text = esc_html( $text );

// Potentially many lines of code.

echo '<a href="'. $url . '">' . $text . '</a>';

In this revised code example, it is easier to determine that $url and $text are escaped:

echo '<a href="'. esc_url( $url ) . '">' . esc_html( $text ) . '</a>';

Escape on string creation

It is sometimes not practical to escape late. In a few rare circumstances output cannot be passed to wp_kses, since by definition it would strip the scripts that are being generated.

In situations like this, always escape while creating the string and store the value in a variable that is a postfixed with _escaped, _safe or _clean (e.g., $variable becomes $variable_escaped or $variable_safe).

If a function cannot output internally and escape late, then it must always return “safe” HTML. This allows echo my_custom_script_code(); to be done without needing the script tag to be passed through a version of wp_kses that would allow such tags.

Additional resources

For more comprehensive guidance and examples for validation, sanitizing, and escaping methods, review the WordPress Developer Resources guide for Validation Data.

For more information on why VIP takes these practices so seriously, read The Importance of Escaping All The Things, which discusses why escaping (sanitizing input and escaping output) is a critical aspect of web application security.

Last updated: May 20, 2024

Relevant to

  • WordPress