SQL Server 2008 DR, Testing, Waits and Queues

Here are some notes on “SQL Server 2008 DR, Testing, Waits” I took while attending an advanced class on SQL Server taught by Ron Talmage (from http://www.solidq.com/na/MentorDetail.aspx?Id=38).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Disaster Recovery

  • RTO (Recovery Time Objective), RPO (Recovery Point Objective), RTA (Recovery Time Actual)
  • Backup Media (disk/tape) and Location (local, remote, offsite)
  • Recovery Technology (Backup/Restore, Mirroring, Log Shipping, Clustering, Replication)

DR and HA

  • What’s the difference between HA and DR?
  • How would you prepare for a disaster?
  • What happens if you lose the entire data center?
  • Primary data center, standby data center.
  • Consider risks, cost, time to invest.
  • Big question: What are you trying to protect against?

Class discussion


Benchmarking and baselining

  • Load-testing, benchmarking and baselining
  • Meet performance requirements
  • Verify correctness
  • Determine system limits
  • Determine impact of changes

Class Discussion

  • Baseline - Ground zero, known state, Point of comparison
  • Benchmark - Measurement under specified conditions, goal, used to estabilish baseline
  • TPC - http://www.tpc.org/,  TPC-E for OLTP, TPC-H for DW/OLAP. Reports trans/sec and response time
  • Load test - Application of a load, looking for a sufficient or expected load
  • Stress test - Increase load. Done with concrete in construction.
  • Saturation test - Add load until it fails.
  • Endurance test – With specified load for a long time

White Paper

Performance counters


Waits and Queues

From the white paper, regarding OLTP workloads

  • Database Design
    • High Frequency queries having a high number of table joins (>4)
    • Frequently updated tables having # indexes (>3)
    • Big IOs - Table Scans, Range Scans (>1)
    • Unused Indexes
  • CPU
    • Signal Waits (>25%)
    • Plan reuse (<90%)
    • Parallelism: Cxpacket waits (>5%)
  • Memory
    • Page life expectancy (<300 sec)
    • Page life expectancy (Drops by 50%)
    • Memory Grants Pending (>1)
    • SQL cache hit ratio (<90%)
  • Disk
    • Average Disk sec/read (>20 ms) – Comment: this should be lower (>5, >8)
    • Average Disk sec/write (>20 ms) -- Comment: this should be lower (>5, >8)
    • Big IOs - Table Scans, Range Scans (>1)
    • Low bytes per sec
  • Blocking
    • Block percentage (>2%)
    • Block process report (30 sec)
    • Average Row Lock Waits (>100ms)
    • LCK_M_% (in top 2)
    • 5, High number of deadlocks (>5 per hour)
  • Network
    • High network latency + many round trips to DB (Output queue length >2)
    • Network bandwidth used up (Packets Outbound Discarded, Packets Outbound Errors, Packets Received Discarded, Packets Received Errors)

Troubleshooting Performance Problems in SQL Server 2005

Diagnosing Transaction Log Performance Issues and Limits of the Log Manager