SQL Server : Protect Data at All Costs
Maintaining high availability to corporate data stores managed with SQL Server is an essential element of any data-management strategy.
Paul S. Randal
Without the uninterrupted ability to store and retrieve data, business would grind to a halt. Besides its people, data is increasingly the most important asset in any enterprise. And SQL Server 2008 or SQL Server 2008 R2 are frequently at the core of any data-management strategy. So, if you think about it that way, the developers and DBAs are the ones responsible for keeping the business running.
However, how much definitive guidance filters down from the business unit managers to those responsible for the data tier? Are the business requirements communicated clearly? Are they communicated in such a way that technology professionals can translate them into a productive strategy?
In certain market segments, there are stringent regulatory requirements around infrastructure aspects like security auditing, data encryption and data retention. Failure to meet those requirements can lead to the business being fined or censured, plus losing public credibility and future revenue—often the worst thing that can happen in a highly competitive market.
Align Your Data-Management Strategy
There are certain business requirements that seem easier or more straightforward for business leaders to communicate, like those around security, reporting, workload management and auditing. Fortunately, those are also easier for you to implement within the framework of SQL Server 2008:
- You can meet data-security requirements using features like Transparent Data Encryption that encrypts data at rest, and Extensible Key Management that lets you store encryption keys “off-box” and away from the encrypted data.
- You can fulfill reporting requirements with SQL Server Reporting Services.
- The Resource Governor can help you predict workload performance.
- You can use SQL Server Audit to meet comprehensive auditing requirements.
However, there are two major business requirements that are often poorly communicated: system downtime and acceptable data loss. These are known as Recovery Time Objective (RTO) and Recovery Point Objective (RPO), respectively. Unfortunately, it’s quite common for business managers to neglect considering RTO and RPO, only to find out that the data tier isn’t protected to the extent they’d like when a disaster occurs, which leads to downtime or data loss.
Whether you’re a business manager or a DBA, take a minute right now to consider if you know for sure that the data tier is protected to the extent the business requires. If you come to the realization that it isn’t, what’s your plan to resolve the situation?
Neither panic nor complacency is an appropriate reaction. Conducting a fire drill to put a strategy in place by next week is a recipe for disaster in itself. It takes care and diligence to design and implement an appropriate and comprehensive SQL Server high-availability and disaster-recovery (HA/DR) strategy. Ignoring the problem invites calamity and is tantamount to business negligence.
Work Through the Requirements
The key to designing a successful strategy is first working out the business requirements. Then you have to balance those against the business limitations. This is where IT and business unit leaders have to meet face to face and see things eye to eye. The strategic requirements have to encapsulate the following factors for each portion of data relevant to business operations:
- How important is this portion of data compared to the rest of the corporate data store? Business managers often state that everything is top priority and must be protected equally. That works with a small amount of data, but becomes increasingly impractical with multiple terabytes spread over multiple SQL Server instances.
- How much data can the business afford to lose? Business owners would understandably like to see zero data loss, but that’s not always practical.
- How long can the data be unavailable? Business owners would also like to see zero downtime, but, unfortunately, this isn’t achievable in reality. You can, however, get very close.
- Do the first or second factors change at various times of day or at the weekend? This can have a profound effect on your ability to meet requirements. Zero downtime and data loss are far more achievable for a limited period, say 9 a.m. to 5 p.m. on weekdays, compared to full access 24x365.
- Is it acceptable to compromise workload performance to preserve data availability and durability? The only technologies that can provide zero data loss require synchronous mirroring of transaction log records or I/O subsystem writes. Both of these can lead to a processing delay. It’s a trade-off.
A good way to think about these is to consider the impact on the business of each portion of data being inaccessible or lost. You may be surprised when you think through and quantify the potential ramifications for your customers, your business image and your regulatory controls.
Work Through Limitations
One of the more-common mistakes when designing and implementing an HA/DR strategy is moving ahead with the technical design without first considering the limiting factors. This means either returning to the drawing board—a waste of time and money—or implementing a sub-standard strategy that doesn’t meet the business requirements.
There are many limitations, both technical and non-technical. The overriding factor is usually the budget. More hardware means more power, which means more heat dissipation, which means more air conditioning, which means even more power, which all means more space required and more budget dollars allocated to that physical infrastructure. You must also consider the cost of the hardware, the extra SQL Server and Windows licenses, network bandwidth, and possibly even more personnel to manage the extra systems and the rest of the datacenter.
Compromise and the Corporate Jigsaw Puzzle
Once you’re familiar with all the technical limitations, the trick is to arrive at the most effective compromise. You need a prioritized list of the data that’s most important to the business. Given the limitations under which you’re working, you’ll be evaluating the technologies that help you meet the most important business requirements.
It’s essential that you don’t just try to adapt an incumbent technology to meet new business requirements. Don’t jump into or choose a technology without properly evaluating it against your business priorities. It’s always better to put in the effort early on and go through the process properly. You’ll end up with a better strategy that saves you time and money in the long run.
If you find you can’t meet the business requirements with the technologies you can afford, you’ll need to work with the business unit leaders to change those requirements to reflect the budgetary realities. As a technologist, for example, there’s no point agreeing to a business requirement of zero data loss if there are insufficient funds in the budget for synchronous technology. When a disaster occurs, the business managers’ expectations will not be met and fault for the situation will fall to the IT staff.
One of the hardest things to do when designing your HA/DR strategy is often ensuring that it forms a harmonious component of your organization’s overall IT strategy. For instance, if you’re the DBA in a large corporation, there are likely other teams responsible for the Windows servers, the network, the storage, the building infrastructure and so on. If the business requires a particular database to be available within four hours of any disaster, you may need to involve all these teams to make sure that can happen. This brings inter-team politics and relationships into the mix. The other teams must agree to meet the same service-level agreements as the data-tier team, and the expectations of and promises to the overall business.
If you feel your data tier is not adequately protected, it’s likely that the HA/DR strategy isn’t being properly tested in your business, either. It’s imperative when you go through the process of designing and implementing an HA/DR strategy that you actually test the system so it can respond in the event of a crisis.
This is easier said than done, though. Convincing the business managers to conduct a test that may result in downtime is a challenge. You could present the argument that it’s better to conduct a test when everyone is on-site, expecting a “disaster,” and ready to step in and quickly fix any problems. The alternative may be to discover a flawed design when a disaster occurs at 2 a.m. on a company holiday when there’s only a skeleton staff on hand.
Even though you may discover that your data tier isn’t protected to an acceptable degree from downtime and data loss, there are plenty of options for implementing an HA/DR strategy using SQL Server 2008 or SQL Server 2008 R2. Understand the various technologies and their trade-offs, and examine the architectures that other companies have successfully deployed. Check out the following white papers and blog posts for more information:
- White Paper: “High Availability with SQL Server 2008 R2”
- White Paper: “Proven SQL Server Architectures for High Availability and Disaster Recovery”
- SQLSkills.com blog post: “Importance of having a good disaster recovery plan”
- SQLSkill.com blog post: “Importance of testing your disaster recovery plan”
Make sure you work toward putting a valid strategy in place. It’s the only way to protect your business and avoid unexpected downtime.
Paul S. Randal* is the managing director of SQLskills.com, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at SQLskills.com/blogs/paul, and you can find him on Twitter at twitter.com/PaulRandal.*