Disaster recovery for VIP: Site outage protocol
Status of the WordPress VIP Platform can be referenced at all times at the WPVIP Status page.
Though VIP has 24×7 automated monitoring and will be aware of an outage immediately, customers can contact VIP Support by creating a Support ticket using the Urgent priority level if there is an issue with an application not loading.
Support tickets filed with Urgent priority level immediately alert VIP’s entire team, and a member of the Support team will respond quickly within a customer’s defined urgent SLA.
Urgent tickets should only be used for outages, security issues, or workflow-blocking concerns. Remember to include as many details as possible to help VIP Support solve the issue more quickly.
VIP offers different code review levels based on the contract type to make sure a site will be secure, performant, and adhering to best practices. VIP also runs high-performance checks ahead of site launches to ensure the application is optimized before going into production.
In addition, VIP maintains a blend of documented and automated procedures for dealing with various equipment or other failures.
All VIP production environment databases are backed up every hour, including custom tables. Backups are encrypted at rest before being replicated to Amazon AWS S3 utilizing encryption in transit and retained for 30 days.
The VIP File System (uploads/media library) is replicated in near real-time to all origin Data Centers.
VIP maintains several origin data centers, each with additional capacity that can be used in the event of a failure. The primary data centers are located in or around Dallas Fort Worth, Los Angeles, and Washington D.C.
All VIP application environments are spread across networking (e.g. switch) and power (e.g. rack power) infrastructure, to mitigate against equipment issues with networking or power affecting all the resources assigned with running a given VIP production environment.
VIP provides customers with access to New Relic, which includes the availability and monitoring service Synthetics Lite. Specific thresholds are set to trigger warnings for performance issues, often flagging concerning items that could contribute to—or indicate that—an application environment is unavailable or has service issues.
In outage scenarios, VIP uses an established Outage Mode protocol. There are different leads assigned to different roles, allowing the engineering team to focus on resolving the issue. This also ensures that someone is providing updates to the WPVIP Status page, and the VIP Status Twitter feed. Once the outage has been resolved, VIP follows up with an After Action Report in the VIP Lobby with details about what led to the issue, and preventative measures that may have been put in place to keep a similar scenario from occurring again.
In addition to the communication outlined above, Premier customers should expect to hear from their designated Technical Account Manager (TAM) who will deliver a detailed incident report on the customer’s shared P2 and an email to any key stakeholders that may not be involved in Slack or P2. A customer’s Premier account team is also available to discuss the incident and follow up actions by email, a Zendesk ticket, or a quick phone call if desired.
Last updated: April 04, 2023