We’re hoping to clear the long holiday weekend without incident. We’ve had a good run of luck with a growing client list, awards and industry recognition. Good momentum for our branded browsers. And our blog posts, while we hope are substantive and useful, tend to be rosy.
Well, last week we hit our first real hiccup. In the midst of a bunch of great news, and a time when we hoped to be seeing a good influx of new traffic, our site went down. The impact potential isn’t just our site and company. Given that our product represents other brands – the problem could be more wide spread.
Our recent successes revealed a bug in the open-source code behind our site. An influx of traffic and events were triggering an infinite loop that ultimately led to server overload (hope that made sense). We diverted server resources to the more critical components of our company, and let our corporate website take the brunt of the downtime. Not necessarily the face you want to show interested prospects, but the right choice nonetheless.
We feel fortunate our server guy is a staunch believer in open-source code. While we run a lean organization, the ability to rely on the shared intelligence of the open-source community greatly expands his knowledge. In a way, it’s like a virtual team and that team offered the information he needed to resolve the issue. Naturally, we’ll share back as we continue to tweak and improve.
Now that we’re heading into a long weekend, we’ll be a little more attuned to the business than we otherwise might have been. And a stable server will certainly make us thankful.
Lessons Learned/Good Habits:
• Track your site status. We’re using Monit on the server and an external monitoring site to send alerts.
• Have a support escalation plan in place – who to call, where to reach them, both internal and external.
• Avoid a single point of failure – whether it’s the password to get support from your web host, or an on-call contractor to backfill key knowledge positions.