Skip to content

Backgrounds

How-to Guides

Technical References

VIP Dashboard /

HTTP request log shipping

VIP’s Log Shipping feature allows you to automatically save HTTP request logs to an Amazon Web Services S3 bucket at 5-minute intervals. The logs are then available to your team and contractors for storage, process, or analysis. Logs are an important asset for understanding the use of your system, connectivity issues, performance tuning, usage patterns, and in analyzing service interruptions.

Currently we only provide Log Shipping for HTTP (web) request logs, which show data about each request made to our edge network of servers.

Requirements

You will need:

  • An AWS S3 bucket. Make note of the bucket name and region for the setup in the next steps.
    • Please see the AWS bucket naming rules documentation, and avoid using any character other than lower case letters, numbers, or hyphens in bucket names.
  • Access to create/update the AWS Bucket Policy configuration for the bucket.
    • Bucket must not use KMS encryption – if default encryption is needed, please use the SSE-S3 option.

Configuration

  1. Get the name of your AWS bucket and region.
  2. Enter it into the dashboard under Settings > Log Shipping
  1. The dashboard will generate a config file in JSON format that you need to paste into your AWS Bucket Policy configuration. For the desired bucket, navigate to “Permissions,” then select “Bucket Policy.” The JSON file can be saved there.
  2. Once the configuration information is entered into the dashboard, a test file will be sent to the bucket. Note that a test file is uploaded as part of the verification process, aptly named vip-go-test-file.txt. This file will always be present in a site’s configured bucket and path, alongside the date folders that contain the logs themselves.

The path used to write to the bucket is [bucket]/[app_name]/[app_environment] (e.g. my-log-bucket/my-app/production). This means you can use the same bucket for more than one app or environment, should you choose to do so.

Objects written to the specified S3 bucket are done so with the bucket-owner-full-control canned ACL.

Restricting access by IP range

If you want to restrict access to your AWS S3 bucket via IP range, ensure your bucket access policy accounts for the dynamic IP range accessible at https://go-vip.net/ip-ranges.json. You will need to implement a system to auto-update the access policy, as the IP ranges are subject to change.

Log contents

The log files are written as a series of gzipped JSON files. Here is a sample record:

{
  "client_site_id": "000",
  "remote_user": "",
  "request_url": "/",
  "wplogin": "-",
  "timestamp": "19/May/2020:17:03:58 +0000",
  "request_type": "GET",
  "scheme": "https",
  "http_referer": "https://example/",
  "http_x_forwarded_for": "",
  "true_client_ip": "",
  "remote_addr": "REDACTED",
  "tls_version": "TLSv1.3",
  "content_type": "text/html; charset=UTF-8",
  "upstream_country_code": "GB",
  "sent_cache_control": "max-age=300, must-revalidate",
  "timestamp_iso8601": "2020-05-19T17:03:58+00:00",
  "sent_vary": "Accept-Encoding",
  "sent_x_cache": "hit",
  "request_time": "0.001",
  "http_host": "example.com",
  "http_accept_language": "en-US,en;q=0.9",
  "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
  "http_version": "HTTP/2.0",
  "body_bytes_sent": "8981",
  "status": "200"
}

Description of fields

body_bytes_sent — total number of bytes sent to the client

client_site_id — an internal ID unique to this environment

content_type — the media type of the resource, e.g. text/html; charset=UTF-8

http_host — the domain, e.g. example.com

http_accept_language — the contents of the Accept-Language request HTTP header

http_user_agent — the contents of the User-Agent request header

http_version — HTTP protocol version

http_referer — the Referer request header, if available, containing the purported address of the web page from which a link to the currently requested page was followed

http_x_forwarded_for — a header that is a means of logging a client’s originating IP address

remote_user — the username if the request was authenticated with HTTP Basic Authentication (we don’t log the password)

request_url — the path of the resources that was fetched, not including elements that are included elsewhere, e.g. the protocol (e.g. http://, see ‘scheme’), and the domain (e.g. example.com, see http_host)

request_time — the time taken for the request

request_type — the HTTP method

sent_cache_control — the contents of the Cache-Control HTTP response header

sent_x_cache — a header from the VIP platform indicating whether the response was from a cache hit, miss, or pass

scheme — either http or https

sent_vary — The contents of the Vary HTTP response header; note that we do not allow free use of the Vary header (e.g. Accept-Encoding)

status — the HTTP response status code, e.g. 200, 404, etc.

timestamp — UTC date and time of request

timestamp_iso8601 — UTC date and time of request in ISO format

true_client_ip — a request header commonly set by reverse proxies, including Cloudflare, to indicate the remote address of the client they are forwarding requests for (see also: http_x_forwarded_for

remote_addr — IP address of the client making the request (see also: true_client_ip and http_x_forwarded_for)

tls_version — TLS version used by the client

upstream_country_code — all requests are geocoded by country at the edge of the VIP CDN using the incoming IP address, e.g. “US”, “GB”, etc.

wplogin — the login name (i.e. user_login) of the authenticated WordPress user, if any; requests where there is no authenticated WordPress user this field will contain -

Using your log data

The JSON formatted log files are readable individually by humans, but to make full use of your logs you will need to ingest them into another service. We have a tutorial on how to analyze the access log data from our Log Shipping tool using GoAccess.

Here are some other platforms that can help you make the most of your data, depending on your use cases:

  • ELK (Elasticsearch, Logstash, Kibana) will help you filter and view your logs
  • Splunk will help you search, monitor, and analyze the data from your logs
  • Data Dog will help you understand development issues within your logs
  • Botify will help you understand SEO issues revealed by your log data

Last updated: April 09, 2021