Over the years we have had customers reporting an outage on their websites. Most clients would never even notice this as they occur usually in the middle of the night. However, lots of IT departments use 'pinging' software to continually monitor their sites and take great pleasure in reporting an outage to anyone who cares. (Usually the MD or or the Marketing Department/Director).
So the question of uptime is one we get a lot. And
the notion is that if we work hard enough, test out the hosting
companies enough, talk to enough people, and generally get smarter, we
can find and recommend the right hosting partner who will deliver 100%
uptime.
This is not true. And it has nothing to do with our effort, time, research, or overall intelligence.
It’s related to the nature of the internet.
And no one can promise you 100% uptime (for a price you’re willing to pay). Want to test what I’m talking about?
The test for what I’m saying is simple:
Here are the steps you can take.
- Look up any hosting provider you like (or want to use)
- See if they have a 100% uptime guarantee.
- Then see if they have any statements after the 100% uptime guarantee.
If they do, they don’t really guarantee 100% uptime. Right? Because
why would you need to say more after the 100% uptime guarantee.
If I tell you that I won’t stab you, there are no further statements,
right? Not like, “If I do stab you, I will be sure to put a bandaid on
the wound,” or “If I end up stabbing you for a reason that isn’t your
fault, I promise to drive you to the hospital.” Nope. If the guarantee is 100%, there are no “if” statements afterwards.
An uptime guarantee is – no matter which host you look
at – simply a promise of what refund the host offers customers if
there’s a network outage. The reality is that many companies simply won’t offer you a 100% uptime guarantee. But if they do, they’ll likely articulate
exceptions.
- Failure of systems, internet, infrastructure, network, power, facilities or connections delivered by third parties
- Applications, software, or operating system failures because of
denial of service attacks, hacker activity, or other malicious events
- Acts of God (weather, etc)
Oh, and maintenance is also an exception. So trust me when I tell you that the tests are clear –
you won’t get 100% anywhere.
You could get 100% uptime…but it will cost you
Let’s talk, for just a second, about how you might go about getting 100% uptime, if you really wanted it. To do that, we need to understand what’s happening behind the simple things we do with a browser and the web. When we make a request for a website, imagine that you’re actually
asking someone to send you a book, in chapters, via snail mail. (I know,
who would do that??)
So let’s say I ask for "Harry Potter and the Goblet of Stone". Here’s what happens in a normal network.
- I send you a note asking you for the book.
- You collect all the chapters (117 of them).
- You send me each chapter in a separate manilla envelope addressed to me.
- You take them all to the mailbox and drop them off.
- The postman sends them through the system and they arrive at my post office.
- My mailman delivers them to me.
- Only, I don’t get chapters 10, 43, 86, and 92.
- I send you a note asking for those again.
- You grab copies of those and you resend them to me.
- The same thing happens, but I still am missing 92.
- I ask again and this time your delivery reaches me.
- Then I open all the envelopes and arrange them in order and I start reading.
This is what happens every time we ask for a web page. And all those
chapters are data packets. And those post offices (and mailmen) are like
routers. And that chatter back and forth about getting all of what I
want / need – that’s the internet protocols that send communication back
and forth between my browser and your server.
It’s kind of crazy!
Where it gets even more complicated is when you imagine that those
packages aren’t getting delivered using the same route. Ever order 10
things from Amazon and get them in different shipments? If you track
them via UPS or Fedex, they don’t necessarily go through the same locations or hubs
across the country.
That’s the same thing with routers and packet traffic.
Now why is this all important? Because in a high availability setup, you have to mitigate issues on several fronts.
This isn’t just a “use Cloudflare and everything will be ok.” That’s just not true.
One way to mitigate this is to use a different kind of protocol –
anycast instead of unicast. This means that instead of me asking you for
the book, I can make a request for the book and you and all your
friends, spread out everywhere, could react to me individually – based
on who is closer.
This translates, technically, to the notion that confuses people
because we told everyone that every domain has a unique
IP address (like the address of your house). And the reality of using
DNS on an Anycast network, is that the IP can be registered on several
servers in several locations. Crazy, I know.
That would help with the speed and performance of requests (like the
need for some chapters sent again) because they could go to many
different locations. Since I’m in Derby, I could hit a server in London.
If you’re in New Jersey, you might make a request to a server in New
York.
But that’s not the only place where you need redundancy.
You would also need to mitigate issues with the servers themselves.
That means that the place that stores books (a bookstore, if you will)
needs its own support if something happens there.
So you’re going to need more than DNS redundancy, you’ll need server
clusters. And while you might want to pay for the cheap cold or warm
failover, true high availability (100% uptime) will likely require hot
failover with a heartbeat monitor.
Think of that heartbeat monitor as one of those young interns at the
bookstore that has to keep running to the back to see if a book is
there. Only this time they need to run all over town because your
cluster might not be located all in the same spot.
And the moment he comes back to tell you that in warehouse A the book
isn’t there, you need to update your infrastructure to route all
requests away from that warehouse and to another that has it. But you
also need him to go run an order to get that book back in stock.
Are you starting to see why this is expensive and likely more than you want to pay?
You can’t get this for £10-100/month
I love that all sorts of hosting companies offer tremendous deals. Many are doing great things.
But none of them will reserve for you twice the servers you need,
located in different places, with a heartbeat monitor, and
synchronization, along with anycast DNS services all for £10 a month.
Some hosts will help you with this, but you won’t be paying a few bucks.
But there is good news.
If you really want this, or need this, you can create it yourself on
Amazon Web Services. They have everything you would need, assuming you
want to get into that configuration game.
In this way, all hosts are equal
What I’m telling you is this. The reality of storms, earthquakes,
flooding, DDOS attacks, hackers, and more – they don’t distinguish
between hosts. They don’t care about you and your specific site.
In the end, these things happen.
And unlike SimCity (the original), there is no setting to turn on
that protects you from it all. And just like when you’re in the slow
lane on the motorway, changing will likely just mean you’re about to get
into the slowest lane now. It’s just the rule.
Swapping won’t change much, if your host is good to begin with.
So today, you either pay a lot of money for high availability systems, or you recognize that
there’s no such thing as 100% uptime...
(With thanks to Chris Lima for most of this information
)