Skip to content

robots.txt

WordPress environments that are accessible at a convenience domain have a hard-coded /robots.txt output that returns:

User-agent: *
Disallow: /

Requests to any URLs on the environment will also return an x-robots-tag: noindex, nofollow header. These settings are intended to prevent search engines from indexing content hosted on non-production sites, or unlaunched production sites.

Limitations

In order to modify the output of /robots.txt, the environment (production or non-production) must be accessible at a custom domain that is set as the primary domain. Replace the convenience domain with a custom primary domain by completing the steps to launch a WordPress single site, or by launching the main site (ID 1) of a WordPress multisite.

Modify the robots.txt file

To modify the /robots.txt file hook into the do_robotstxt action, or filter the output by hooking into robots_txt filter. In this code example, a specific directory is marked as nofollow:

function my_disallow_directory() {
    echo "User-agent: *" . PHP_EOL;
    echo "Disallow: /path/to/your/directory/" . PHP_EOL;
}
add_action( 'do_robotstxt', 'my_disallow_directory' );

Caching

The /robots.txt file is cached for long periods of time. In order to force the cache to clear after any changes made to the file, go to Settings > Reading within WP-Admin and toggle the Search engine visibility setting, saving the changes each time the setting is changed.

The page cache for the /robots.txt file can also be flushed using the wp vip cache purge-url WP-CLI command which is available on WordPress environments.

Last updated: August 03, 2023

Relevant to

  • WordPress