Site outage protocol
Though we have 24×7 automated monitoring and would be aware of the outage immediately, you should let us know if there is an issue with your application not loading. To notify our entire team immediately, you can send in a ticket with the word “urgent” in the subject line. Someone will respond to you within minutes.
Urgent tickets should only be used for outages, security issues, or workflow-blocking concerns. And please remember to include as many details as you can to help us solve the issue faster for you.
Before we touch on how we would respond, let’s go over what is done to prevent outages from happening.
VIP offers different code review levels based on the contract type to make sure your site will be secure, performant, and adhering to best practices. We also run high-performance checks ahead of site launches to ensure the application is optimized before going into production.
VIP also maintains a blend of documented and automated procedures for dealing with various equipment or other failures. All VIP production environments are backed up every hour, including custom tables, and 30 days of backups are held in the origin data center. A recent backup for each production environment is stored in a separate data center.
VIP maintains several origin data centers, each with additional capacity that can be used in the event of a failure. Our primary data centers are located in or around Dallas Fort Worth, Los Angeles, and Washington D.C.
All VIP application environments are spread across networking (e.g. switch) and power (e.g. rack power) infrastructure, to mitigate against equipment issues with networking or power affecting all the resources assigned with running a given VIP production environment.
We provide our clients with access to New Relic, which includes the availability and monitoring service Synthetics Lite. We have specific thresholds to set off warnings for performance issues, often flagging concerning items that could contribute to or indicate that an application environment is unavailable or has service issues.
In outage scenarios, VIP uses an established Outage Mode protocol. There are different leads assigned to different roles, allowing our engineering team to focus on resolving the issue. This also ensures someone is updating the VIP Status Twitter feed and VIP Lobby. Once the outage has been resolved, we follow up with an After Action Report with details about what led to the issue and preventative measures that may have been put in place to keep a similar scenario from occurring again.
In addition to the communication outlined above, Enterprise clients should expect to hear from their designated Technical Account Manager (TAM). The TAM will alert their primary contact about the situation with the details gathered so far, and then provide status updates throughout. Once the issue has been resolved, the TAM will follow up so you know when the problem has been resolved. This could be via an email, a Zendesk ticket, or a quick phone call if desired.