Skip to content

Elasticsearch

Elasticsearch (ES) is an open-source search and analytics engine that powers WordPress VIP’s Enterprise Search and Jetpack Instant Search.

The Debug Bar and the Search API can be used to debug Elasticsearch issues. Customers with Enterprise Search enabled are able to debug with Search Dev Tools.

When Elasticsearch is powering a site’s search, it continually indexes the site’s content. During publishing actions, action hooks capture the change events and identify the changed data to be indexed. Elasticsearch has its own environment and data store, and interactions with it are via REST API requests. As search requests are made on a site, API calls are made to tell ES what to search for and how to weight the results.

These communications occur asynchronously, so there may be a slight delay between when a change is made in WordPress and when the change appears in Elasticsearch. For this reason the WordPress database should be referred to as the source of truth for search results.

To integrate a WordPress site with Elasticsearch, code will be needed to monitor for changes made to content and to send those changes to the Elasticsearch cluster for indexing. A “cluster” is a group of one or more Elasticsearch nodes working together.

Code is also needed to intercept the search queries and, instead of making LIKE queries to the MySQL database, send an API request to the Elasticsearch endpoint.

The ES endpoint will return a set of search results, containing post IDs:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 569,
      "relation": "eq"
    },
    "max_score": 540.97675,
    "hits": [
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "4536344",
        "_score": 540.97675,
        "_source": {
          "post_id": 4536344
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "105829",
        "_score": 516.1369,
        "_source": {
          "post_id": 105829
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "306074",
        "_score": 516.1369,
        "_source": {
          "post_id": 306074
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "3688167",
        "_score": 476.97778,
        "_source": {
          "post_id": 3688167
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "4616046",
        "_score": 476.97778,
        "_source": {
          "post_id": 4616046
        }
      }
    ]
  }
}

Those post IDs can be used to fetch the actual data from the database and display post summaries:

SELECT wp_posts.ID
FROM wp_posts
WHERE 1=1
AND wp_posts.ID IN (426,506,192)
AND wp_posts.post_type IN ('post', 'page')
AND wp_posts.post_status = 'publish'
ORDER BY wp_posts.post_date DESC
LIMIT 0, 3

In a typical search request:

  • The normal WPDB query is intercepted.
  • A request to the ES endpoint is made with the details from the query (i.e. the search terms).
  • A response is received, containing a list of matching post IDs (and often, other data such as rankings).
  • A new DB query is made to get the list of posts, or a series of get_post() calls are made for individual posts.
  • Results are returned for the matching posts and are rendered on the page.

Last updated: June 03, 2022