What the Netflix Holiday Outage Means, And What We Can Learn

This holiday, Santa delivered us a big stocking full of internet stability issues.

Christmas was the time to sit down with family, turn on the television, and wonder why the hell Netflix wasn’t working.  You can see a timeline at GigaOm, and the culprit was clearly Amazon Web Services, whose “Elastic Load Balancing Service” failed to live up to any of the four words in its title.  Heroku and others suffered outages, but except for a sporadic report or two, Amazon’s on video service apparently ran fine.

I can’t begin to imagine the discussions between Netflix and Amazon over this one, at least I can’t begin to imagine one that doesn’t start with the words “What the hell . . .” and goes downhill from there.

(Steam had it’s own outage, but that was kind of overshadowed by this doozy).

It’s not the first time an Amazon outage has a huge impact.  As a person up to his armpits in IT, living in Silicon Valley, I get to hear about Amazon outages more than most.  Even if I don’t want to, which is frankly a lot, considering I’m the Accidental Therapist so often.

So it’s not the first time, but it’s a big outage, on a big day, with a big client, for a service that’s involved in a lot of major websites.  Time for us pro geeks to put on our Big Geek shoes and sort out what this means for us.

1) Beyond any impacts to Amazon, Netflix, etc. this is a serious wake-up call for stability over the holidays (and that includes Steam).  The fact this even could happen in this day and age is a sign that some people don’t take holiday stability seriously enough, and they bloody well should.  This was the time people would be watching movies, everyone is on the internet or is about to get on with their latest gizmo gift.  An outage should be unthinkable.

Of course we had to think of it because it happened.  So anyone working in anything remotely related to IT your takeaway is to use this incident to promote good stability and holiday policies.  And to scare the crap out of anyone not taking them seriously.

Some people have to draw the short straws and keep monitoring systems in case they have to apply the well-tested emergency plans that you doubtlessly have carefully put together.

2) Amazon has taken a black eye for this in the IT crowd, and there’s often grumbling when AWS issues happen as so many people use AWS.  Their competitors can (and probably will) step up to the plate and try to wrest service and clients away from them – and they have a pretty wide range of competitors:

If Netflix publicly moves away from Amazon, or any other big name, it’d be a big win for whoever gets the contract (and a place to send your resume).  It would also be tough on Amazon.

If you work in IT, this may well come up: “Do we use Amazon?” and you may well need to answer.

3) It’s painfully clear that as we move to a more connected world that we’re back to the old “mainframe is down” problem of terminal computing.  With more things in the cloud, we’re discovering that doesn’t mean jack unless we can get to the bloody thing.  If you work in software, please, remember this, it may save your customers stress – and you your job.

4) Netflix is not going to be held entirely blameless here because not everyone angry over the outage are geeks like us.  It’s going to be common users who aren’t happy.  Netflix has to make some tough decisions about what to do – and they cannot have any repeats of this.

5) Lost in all of this is how the Netflix competitors are handling this.  See I don’t know what, if anything, Redbox, etc. are doing.  Makes you wonder, and makes you wonder if you should look into them and see if there’s any opportunities.  I also see some potential marketing opportunities if people want to take advantage of Netflix (and further anger their competitor).

6) This is a teachable moment when we can remind people not as buried in tech as we are how complex, and at times unreliable technology is.  May be a good idea for your company, clients, or family to get a quick lesson in how things don’t work.

We can’t have outages like this anymore, not as we rely on these systems, and not at critical times.  If we’re going to live in this connected world, then it’s got to work like the disconnected world – my DVD doesn’t vanish because of an east coast server outage.  People have expectations.

– Steven Savage

Steven Savage is a Geek 2.0 writer, speaker, blogger, and job coach.  He blogs on careers at http://www.fantopro.com/, nerd and geek culture at http://www.nerdcaliber.com/, and does a site of creative tools at http://www.seventhsanctum.com/. He can be reached at https://www.stevensavage.com/.