Picking A Disaster Recovery Site
2011 was a year to make anyone believe in disaster recovery.
In addition to an east coast earthquake, a hurricane in New England, and a Halloween blizzard that left some places without power for two weeks, the Internet buzzed with threats of a giant meteor strike, a solar flare that would fry the planet, and the (ever-popular) end of the world on Dec. 21, 2012 according to the ancient Mayan calendar. Or Rev. Camping. Or someone.
We missed the meteor and solar flare, and the end of the world is still up for grabs, but the lesson is clear. Companies need business continuity to keep their business operating for several days (at least) when their primary data center site isn’t available.
Disaster recovery (DR) can cover a multitude of sins and be approached in a lot of different ways. Although there are general solutions, you need to think carefully about what your business means by “disaster recovery.”
Many companies, especially smaller ones, can get by with something as simple as a spare laptop and a bunch of backup disks – assuming that someone is actually ensuring the backup is done and that the media is accessible and readable. For many more businesses, the answer is some sort of remote site. However that still leaves the problem of how to pick a remote site that will support you while you’re digging out from the next fire, flood, earthquake, or plague of giant carrot-eating Asian locusts.
Made-for-Halloween TV movie scenarios aside, this is serious business and it deserves careful thought. You need to understand your needs.
For example, what functions do you need to keep up and running? How long will those functions have to run from another site? These are usually expressed as recovery time objectives (RTO) and recovery point objectives (RPO). In other words, how fast can you get a function back and up and running, and how far do you have to back up to start from a stable configuration? Rough in these parameters before you go looking for a DR site.
In today’s IT world the obvious answer is to move the data center into the cloud and operate from there in the event of an emergency.
That may be a good answer, but remember that “the cloud” sits on top of physical reality. In other words, that data is still sitting on a server and storage somewhere and the reliability of a cloud solution is only as good as the availability and reliability of the underlying physical systems and software.
This is one case where saying something is “in the cloud” isn’t a complete answer. You need to know at least generally where in the cloud your operations are going to be and how well supported your virtual data center is with communications bandwidth and redundancy. Obviously, having your DR center down the block from your primary location is not a good solution; it is likely to be affected by the same flood that endangers your own business.
So, moving beyond “the cloud,” how do you pick the right solution? That choice is complicated by two things: The enormous array of choices, and the need to decide exactly what features are critical, which are nice-to-haves, and which are don’t cares. In other words, consider what you need before you go looking for what’s available.
The place to start looking, oddly enough, is not with your IT managers or your telecommunications folks. It’s with the legal department.
In our highly regulated society, compliance must still prevail though the heavens fall. Just because you’re trying to run your company on a creaky old laptop from the Wi-fi hot spot at your local Starbucks does not mean you get a pass on security, record keeping, and other such concerns. Find out what you need to stay on the right side of the law, no matter how many alligators are currently living in the swamp that used to be your data center.
With a little research, your lawyers can show you what you’ll need to stay in compliance.
“If you have compliance requirements, you need to start by looking there,” says Michael Lee, principal consultant at SWC Technology Partners, a consultancy in Oak Brook, IL. “There may be minimal distance between centers, or put it in a different region. You need to understand the requirements.”
Only after you learn about your compliance requirements should you start looking at what is required to provide the necessary level of functionality. This isn’t the same as what your regular data center provides. Often, in an emergency, you can get by with a minimal set of functions that don’t run as quickly. This assumes you have disaster recovery playbooks so when an event happens, you’re ready to go. The goal is not (necessarily) to have a full business location but how to bring systems back on line.
This leads to a couple of buzzwords: RTO and RPO. “RTO (Recovery Time Objective) is how quickly you need to be up and running,” says Lee. “If your RTO is several days, you might be okay with just keeping a backup offsite. But in many cases your RTO might be in a few hours.”
RTO, broadly taken, is probably the most expensive metric your disaster recovery plan must meet. It includes not just having the software installed, but also having adequate communications in place and tested. Generally the shorter your RTO, the more expensive it is to meet.
RPO (Recovery Point Objective) refers to where you can pick up from where you were when the tornado snatched the roof off your data center. In general, expect that there will be some loss between the disaster and the recovery. How necessary it is to minimize your data loss determines the strategy you will have to employ. A hotel chain that needs near-instant access to reservation data is in a different situation than a batch-mode-based manufacturing company who could stand to lose three days of work (with anguish, but without threat of the business closing).
Be clear about what you’re shopping for. Disaster recovery means very different things to different people. In choosing a DR center it’s important to know what you’re getting and to make sure that the vendor is on the same wavelength you are.
To some vendors, “disaster recovery” means providing a set of backups or snapshots that you can use to reconstruct your data center somewhere else. At the other end of the scale (and expense) is “Disaster Recovery as a Service” (DRaaS) which provides all the functionality of the regular data center plus such auxiliaries as call center operations. DRaaS may be offered as a cloud service or it may be a physical data center which is ready to go live within the specified RTO.
“Broadly, there are three flavors of disaster recovery,” says Micheal Fell, director of cloud solutions at Logicalis, a cloud-based disaster recovery center based in Farmington Hills, MI. (Fell was speaking of cloud-based DR, but the categories apply to DR services in general.) “Some need offsite backup. The data’s there or it will be shipped there. The RPO may be 24 hours or it may take days.”
Host-based replication can move a data center physically or by virtual machine into the cloud.
“There’s SAN-based replication in a fully virtual environment. It’s not quite doing seamless failover,” says Fell. “Customers who want seamless failover use their own backup data center. Those customers want to be up in 4 to 6 hours with as little data loss as possible, and they’re willing to make the communications and hardware investments to do that.”
Distance in DR
There are actually two kinds of distance you need to be concerned about in disaster recovery.
The first is physical distance: You want your DR facility far enough from your data center so it isn’t affected by the disaster. “If you are in an earthquake zone, tornado area, or a hurricane zone don’t put your DR center in the same location,” says Joel Smith, cofounder and CTO of AppRiver, an e-mail web security firm based in Gulf Breeze, FL. Taking his own advice, AppRiver has data centers in Hong Kong, Texas, Virginia, and London.
The center itself needs to be well located and set up. “People put disaster recovery centers in the darnedest places,” Smith says. “Near a fault line, near an airport, near a flood zone.” Or, he says, they’ll do things like putting the emergency generators in the basement, where they are the first things to flood out.
The second kind of distance is topological. That is the “distance” between your data center and the DR center in terms of response time, including bandwidth. Unlike physical distance, you want to minimize the topological distance for network performance.
It doesn’t do any good to put your disaster recovery center in the most disaster-free site in the nation, East Neversweat, Maine, if the only connection to the outside world for the whole town is a creaky T1 line.
You need to make sure that your remote site has the bandwidth and other resources needed to support your business for the course of the disaster. AppRiver’s Smith recommends a medium-sized facility colocated with telecommunications gear such as long distance hubs. “Find a facility small in size but large in reputation,” he advises.
“You can wipe out a lot of (communications risk) by going with an IBX peering point where providers come together to trade traffic,” Smith says. “A lot of times those peering points will have server space off to the side. They have desk space for IT guys that need to deploy there, and they tend to have more shipping and receiving points for your gear.”
Of course, reputation is critical. Check carefully on a potential vendors reputation by talking to customers and nailing them down on the details. Does the vendor meet their promises? Do they meet the RTO and RPO specified in your contract? How are they to work with? Find out as much as you can about a vendor before committing.
You regularly should also test your DR setup, including your DR center, to make sure it will work in an emergency.
Finally, study the contract carefully. If a requirement isn’t in the contract you’re not likely to get it, so make sure the contract covers all the bases, from RPO and RTO to the quality and amount of power the center provides and the kind of security it offers.
Perhaps the key thing in choosing a disaster recovery site is to choose slowly and carefully. So you’re not rushed by the oncoming hurricane season. Or by the next giant meteor NASA announces is on the way.
This article, “Picking A Disaster Recovery Site,” originally appeared in HP’s Input Output Blog.