The GoDaddy DNS event (which I wrote about here) has been the subject of many a post-mortem and water-cooler conversation in the web performance world for the last week. In addition to the many well-publicized issues that have been discussed, there was one more, hidden effect that most folks may not have noticed – unless you use Firefox.
Firefox uses OCSP lookups to validate the certificate of SSL certificates. If you go to a new site and connect using SSL, Firefox has a process to check the validity of SSL cert. The results are of the lookup cached and stored for some time (I have heard 3 days, this could be incorrect) before checking again.
Before the security wonks in the audience get upset, realize I’m not an OCSP or SSL expert, and would love some comments and feedback that help the rest of us understand exactly how this works. What I do know is that anyone who came to a site the relied on an SSL cert provided and/or signed by GoDaddy at some point in its cert validation path discovered a nasty side-effect of this really great idea when the GoDaddy DNS outage occurred: If you can’t reach the cert signer, the performance of your site will be significantly delayed.
Remember this: It was GoDaddy this time; next time, it could be your cert signing authority.
How did this happen? Performing an OCSP lookup requires a opening a new TCP connection so that an HTTP request can be made to the OCSP provider. A new TCP connection requires a DNS lookup. If you can’t perform a successful DNS lookup to find the IP address of the OCSP host…well, I think you can guess the rest.
Unlike other third-party outages, these are not ones that can be shrugged off. These are ones that will affect page rendering by blocking the downloading the mobile or web application content you present to customers.
I am not someone who can comment on the effectiveness of OCSP lookups in increasing web and mobile security. OCSP lookup for Firefox are simply one more indication of how complex the design and management of modern online applications is.
Learning from the near-disaster state and preventing it from happening again is more important that a disaster post-mortem. The signs of potential complexity collapse exist throughout your applications, if you take the time to look. And while something like OCSP may like like a minor inconvenience, when it affects a discernible portion of your Firefox users, it becomes a very large mouse scaring a very jumpy elephant.