SQL Server 2008 Transactional Replication

Here are some notes on “SQL Server 2008 Transactional Replication” I took while attending an advanced class on SQL Server taught by Greg Low (from https://sqlblog.com/blogs/greg_low/ and https://www.sqldownunder.com/).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


Transactional Replication

  • Agents: Snapshot, Log Reader (like CDC), Distribution
  • Incremental flow: Into distribution DB, reliable store/forward queue, minimal latency (seconds)
  • Can run continuously or scheduled intervals
  • Can have updatable subscriptions
  • Consider – Would asynchronous database mirroring fit the bill?
  • Components: Published, Distributor, Subscriber

Initial Dataset

  • Subscribers need to initialize: schema, tables
  • Created by snapshot agent, applied by distribution agent
  • Store in snapshot folder, can be done via backup/restore
  • Script it: look at the decisions the wizards are making on your behalf
  • Pay attention to the part about articles and options
  • Locking: places shared locks all tables published
  • Careful – Default location of snapshot folder is on C:SQLData. UNC path instead?

Log Reader

  • Runs on distributor
  • Runs continuously but can also be scheduled
  • Scans the transaction logs for changes (data, schema)
  • Only copies commited transactions
  • Calls sp_rpldone to mark where it’s up to
  • If it falls behind, your log will grow

Distribution

  • Runs on Distributor for push subscriptions, Subscriber for pull
  • Copies transactions from distribution database to subscriber

DDL replication

  • Introduced in SQL Server 2005
  • Supported: ALTER for TABLE, VIEW, PROCEDURE, FUNCTION, TRIGGER (no DDL)
  • Not supported: Adding/dropping/altering indexes, alter XML columns
  • Careful – Mixed versions of SQL Servers
  • Careful – Name your constraints to avoid differences between publisher and subscriber

Stored Procedure Execution Replication

  • You can publish a stored procedure
  • Can have a positive effect on performance
  • Scenario: Batch load
  • Scenario: Send specific changes as a stored procedure
  • Consider the intent. Maybe sue SQL Server Broker queues instead

Partitioning of Data

  • Transactional Replication works best where data is cleanly partitioned
  • Scenario: Point-of-sale data partitioned by store ID
  • Bi-directional options in particular benefit from this

Demo - Creating distributor

  • Use wizard, generate to script, look at script
  • Sp_adddistributor - @password
  • Sp_adddistributiondb - @max_distretention, @security_mode
  • Not a scary script at all

Demo - Creating new publication

  • Use wizard, transactional, select table, article properties
  • Article properties – What do you need to copy? Defaults may not work for you, like copy collation.
  • Table filtering, create immediately, schedule, agent security (credentials for snapshot and log reader agents)
  • Save script, look at the script
  • sp_addpublication has lots of options, some of them not in the wizard
  • sp_addpublication_snapshot
  • sp_addarticle, what does @schema_option mean?
  • See https://msdn.microsoft.com/en-us/library/ms147887.aspx
  • Careful – Identity columns – NOT FOR REPLICATION means “turn identity column into regular integer”
  • Careful – some data types are not supported
  • SQL Server 2000 – sp_repladdcolum, sp_repldropcolum – no longer needed in SQL Server 2005
  • Note – Where is the option to do peer-to-peer?

Demo – Publication Properties

  • Subscriptions, Articles
  • Snapshot – Run additional scripts
  • FTP Snapshot – configuration options – need credentials
  • Subscription options
  • - Allow peer-to-peer subscriptions?
  • - Conflict detection – Only in SQL Server 2008, define originator ID,
  • - Careful – Default for “peer-to-peer”, “continue after conflict” is false. Assumes partitioning will avoid conflict.
  • After that, “Configure Peer-to-Peer Topology” option appears
  • Configure Peer-to-Peer Topology – add additional nodes, how the agent connects
  • Careful – Use “refresh topology” on this window if it’s not updated

Updatable Subscriptions

  • Immediate is OK, queued is deprecated
  • Republishing is not supported
  • Adds msrepl_tran_version column to data
  • Not used so much

Considerations

  • Transaction log space – keeps stuff around for replication
  • Distribution db – consider retention period (3 days might not be enough)
  • Careful – Ensure regular snapshot creation, avoid having to retaining more data, longer initialize
  • Careful – Published table must have PK
  • Careful – Default trigger of XACT_ABORT can cause entire batch to be aborted (use skip errors?)
  • Careful – Avoid explicit transaction in triggers at subscribers
  • Partial updates on LOB columns are supported (fragment of column is replicated)
  • Careful – There are limitations around LOB column. Should you even do this?
  • Peer-to-peer – partition data
  • Conflict detection – in SQL Server 2008 – make sure it does not happen with partitioning
  • Thresholds and Alerts – Distributor alerts (expiration, re-initialization, failed data validation)

Backup

  • Needs special attention, cannot restore to alternate server
  • Careful – If you renamed the server, run sp_dropserver, sp_addserver
  • Backup publication, distribution, subscription (if can’t re-initalize), master, msdb
  • “Sync with backup” option – log not trunctated until backed up (distribution, publisher) – careful!
  • In restore, can use sp_replrestart to resynchronize with publisher and distribution metadata
  • If all else fail, tear down and reconfigure (specify that subscribers already have data)

Complex scenarios – In Books Online

  • Updating subscribers
  • Restore of distributor
  • Peer to Peer
  • Restore of Subscriber

Replication Scripting

  • It call all be scripted
  • Do it as part of DR planning, automating steps
  • More options than the GUI options, as mentioned
  • Use option to “generate scripts…” in the GUI

Replication Monitor

  • Helps with common questions:
  • Is it all healthy? Why is it slow? How far behind? Why agent not running?
  • Uses caching, there’s a SQL Agent job to refresh the data every few seconds (4-30 secs)

Performance

  • Baseline – latency, throughput, concurrency, sync duration, resources
  • Look at what it looks like when it’s running normally. Don’t wait until it’s slow.
  • Set expectations right. Know what you can get out of the system.

Validation

  • Row count only, row cont and binary checksum (SQL 7 only), row count and checksum
  • Must stop all update activity at subscriber, adds CPU load
  • Cannot validate with column filters, column offsets, LOBs
  • Careful – watch for false positives – timestamp turns into binary, for instance