OpenDNS System

PAO site reliability issues

Palo Alto – Performance Degradation – resolved

posted on May 18, 2010 6:20 am UTC

Starting at 10:15 PM PDT tonight all of our global locations suffered a significant denial of service attack. All sites withstood the attack with the exception of Palo Alto, which had sporadic reachability issues lasting for almost 30 minutes. This interruption took our engineers longer to diagnose than it normally would have due to some difficulty removing some routing advertisements between our routers and one of our ISPs. We’re still investigating the cause of this routing anomaly. By 10:45 PM PDT, all DNS traffic was routed to alternate locations, including Los Angeles and Seattle, which were online serving traffic the entire time.

For the next 30 minutes, OpenDNS operations personnel investigated the router issue and while performing an emergency maintenance, were forced to take down our website to prevent changes to the OpenDNS Dashboard. We regret this inconvenience, though no DNS service was impacted during this time. By 11:10 PM PDT, all website services returned to normal and all services were online.

As with any interruption of service, we will be evaluating our procedures, capacity planning models and will ultimately take whatever steps necessary to ensure it does not happen again.

Update: To clarify some misunderstandings, DNS was not significantly impacted at any site besides Palo Alto (even though all sites were attacked). At Palo Alto, we have numerous connections to the Internet and peering partners and for reasons we are still investigating, one of our connections to the Internet had a prolonged service interruption and did not behave as designed.

—David Ulevitch



< back to System Home