Reading from Database Mirroring or AlwaysOn Secondary Replicas in SAP Configurations
Based on repeated customer questions, especially around the readable secondary of AlwaysOn, I would like to summarize in (quite) a few words the abilities to read from a secondary replica/mirror with the two main functionalities of SQL Server Database Mirroring (DBM) and AlwaysOn. Though SAP Netweaver applications are always requiring Read/Write access, a lot of customers still seem to have the need to do some read-only activity. These might be data extract from the SAP ERP database or other read-only applications customer use to fill gaps in analytics, reporting, etc. Using a HA/DR functionality like DBM or AlwaysOn, it is justified to ask whether the mirror/secondary replica can be used for such a purpose of read-only access. Therefore we will explain in this article how it can work with these two SQL Server HA/DR technologies.
SQL Server Database Mirroring and reading from the mirror
The only mirror of SQL Server DBM is in principle not accessible for read-only activity. However using another technology of SQL Server Database Snapshots, one can make this happen. See more information on how Database Snapshots work in this recent article related to SQL Server 2012: http://msdn.microsoft.com/en-us/library/ms175158.aspx Since no real changes were done to the way Database Snapshots work, the article does apply very well to earlier versions of SQL Server as well (introduction of Database Snapshots happened with SQL Server 2005)
Hence the steps would look like:
- Create a Database Snapshot on the Mirror according to article: http://msdn.microsoft.com/en-us/library/ms175876.aspx Again the syntax and comments in this article out of the SQL Server 2012 documentation does apply very well to earlier versions of SQL Server. Please note in this step, that the Database Snapshot will have a different name from your primary and secondary database
- Now you eventually need to adjust permissions, etc in the master database to enable the read-only connection to the snapshot
- Then the read-only application can connect against the snapshot and read the data at its state of the time the snapshot has been created
Advantages/Disadvantages of this solution?
- The scenario where this works exceptionally well are some extracts that happen a few times the day. After the extracts are done, you disconnect the read-only application and drop the snapshot. Before the next extract should take place, you create a new snapshot of the same name and repeat the routine.
- Another advantage is that under no circumstances the SAP Netweaver application in a reconnect attempt can end up connecting to the snapshot since the database name is different to the database name of the primary and mirror. As a result an incidental reconnect attempt against the database snapshot would fail.
- You don’t need to break DBM and resynchronize the databases later just to read some consistent state out of the mirror database
- You could perform other tasks like consistency checks through the database snapshot as well.
- For typical reporting purposes, the fact that the data becomes stale is an issue. Also replacing the snapshot with one of a more recent state is a problem since it would need disconnecting the reporting application and reconnecting it after the new snapshot has been created. This can be a major hurdle.
- You can’t keep the snapshot around for too long. The snapshot on the mirror will eat up disk space. Reading the first article of Books Online I referenced explains the copy-on-write functionality of the database snapshot. Means if we change a page on the mirrored database, we will copy its origin state over to the snapshot. Means instead reading it from the origin database, the snapshot now will read it out of its own file. Hence it is reasonable to assume that with the age of the snapshot, the files of the snapshot grow and with that one certainly could run into space problems.
- On could run into issues with autogrowth scenarios. SQL Server is growing its datafiles automatically when they run full. If this is happening on the primary which has enough space since there is no database snapshot consuming space, one could run into situations that the autogrowth which is replicated via DBM, can’t be executed on the mirror since the snapshot takes valuable space away. This would lead to a suspension of DBM.
Now we had customers who went another way and from time to time decided that they would like to use the DBM Mirror for some other purposes where they also would be able to eventually write against this mirror database. Well knowing that this meant to go through the initial steps of database synchronization again (backup on primary, restore to secondary, get up to synchronized point). The steps these customers performed looked like:
- Simply stopping DBM
- Dropping the DBM configuration
- Opening the Mirror by performing a TLog backup on the primary and then restoring the Tlog backup against the secondary w/o defining the norecovery clause
After this the mirror database would be fully usable or any activity against it, like acting as DBMS for a sandbox system or something like this. And that is where an evil chain of events could unfold under certain circumstances!!
In order to explain more, we need to look into what the SAP application side knows about the DBM configuration. Since DBM does not work with a virtual name, the SAP Netweaver side needs to know the name of the principal and the mirror. Usually the SQL Server Client side used by SAP is getting the name of the mirror during connecting and keeps it in memory. As long as the SAP process is up and running and failovers do happen, the SQL Server client will take the information it has about the mirror and will redirect connection attempts to the mirror side if the principal is not reachable. However in the SAP case we do have cases where SAP processes restart and with that the information in the SQL Client for the restarted process is lost. Hence we need to make sure that the information for the principal and the mirror can be read out the SAP Default profile. As a result we look at parameters as described in OSS Note 965908 – SQL Server Database Mirroring and SAP Applications .
Another problem we encountered in situations like this:
- There was a failover to the mirror (the server listed as failoverpartner in the dbs/mss/server parameter)
- Repairs on the principal (listed as first server in dbs/mss/server) took too long. The log on the active side (former mirror now principal) is running full
- In order to be able to cut the committed data out of the log, the DBM configuration got dropped
- As there is no DBM configuration active anymore, SAP processes which restart or newly added application servers would not be able to connect now anymore since the SQL Server instance (former mirror) which now is active does have no idea that it was a mirror. Trying to connect to it under the flag of ‘failoverpartner’ would fail. In order to enable this scenario, we introduced trace flag 1449 as described in this article: http://blogs.msdn.com/b/saponsqlserver/archive/2007/09/26/what-did-we-learn-using-database-mirroring-over-the-last-two-years-in-our-sap-erp-system-second-revision.aspx
- Using this trace flag 1449 on both SQL Server Instances will basically allow to connect under the flag of ‘failoverpartner’ to an instance that is no more in a DBM configuration.
- We also need to assume that the SAP configuration parameters of the default.pfl and the instance profile parameters are buffered. Means changes to the settings only become visible to the instances at restart.
So now let’s assume, the following scenario:
- The dbs/mss/server profile parameter looks like: node1;Failover_Partner=node2
- The actual principal role runs on the foloverpartner node (node2)
- We want to use the acting mirror (node1) as base of a sandbox. In order to do so, we drop the DBM configuration and open the mirror (node1 – defined as principal in the connection string)
- We do have the trace flag 1449 set
- The acting principal (node2) actually is defined as failoverpartner in the SAP instance profiles and not as the first server instance
- We don’t restart the SAP instances in order to change the connectivity parameters in the SAP profiles, but leave it just running like it is
- Means at the end we now got both databases of the former DBM configuration online with the same database name
In case of a SAP reconnect of a SAP process, the SAP process will submit the content as found in dbs/mss/server to the SQL Server Client to establish a connection. Before the acting mirror (node1) got opened up, the connection attempt against node1 got refused, hence the SQL Client got direct to the server that was named as failoverpartner which was the acting principal node2. There a successful connection could get established.
Now that the DBM configuration is dropped and the former acting mirror (node1) and node2 are accessible, the connection attempt in case of a SAP process (formerly connected to node2) reconnecting will look like:
- SAP submitting the cached string behind dbs/mss/server to the SQL Client
- The SQL Client trying to connect to the first server name (node1)
- The SQL Server instance of node1 indeed accepting the connections against the former mirror database since it has been opened and the traceflag 1449 is set
- That single SAP process working nicely against a different database instance than all its colleagues in the same instance/system.
Needless to say that this scenario would lead to disaster. Worst on it, it will take a while until this issue is recognized. Suddenly there might be ABAP short dumps talking about duprecs, etc. Number Range numbers suddenly might collide and other very strange things will happen.
How can we prevent such an issue in this scenario?
- First and easiest to do for the case where the productive system should not be taken down, but the mirror database should be opened is to rename the mirror database latest when the database is opened. A rename of the database would immediately prevent the SAP instances which are working productively to connect to the newly opened database.
- Another solution could be to have it detached from the former mirror instance and reconnected against a meanwhile installed named instance of SQL Server on the same server. Also would avoid such a reconnect incidentally ending up in the wrong database.
- Third possibility would be to rename the former mirror (node1) to a different name. Also should help.
- Or the fourth one would be to bring down the application instances of the running system and change the order of the nodes in our scenario in the connectivity parameters in the SAP profile or to take the failoverpartner out completely.
Obviously there are a quite a few measures one could take in order to prevent such a disaster scenario in case one really wants to open the former mirror database for other purposes. At least one of our customers learned it the hard way.
Read-Only possibilities with AlwaysOn
One of the goals of AlwaysOn was to enable customers using the secondary replicas for reading data as it actually changes. Means as some data gets changed, one should be able to read those changes in relatively short time on the secondary replicas. Sure there is a small delay which is determined by the time it takes to send and persist the Transaction Log data in the secondary replica plus how long it takes to redo those changes on the secondary replica. But under ideal conditions we assume that taking less than 1second.
Another thing which needed to be solved is avoiding collision of the redo operations on the secondary (changing data) and the reading requests. In normal read-committed isolation level, one could imagine readers blocking the redo thread from applying the changes. Or the redo thread would block readers from reading for short time. First case being the more intrusive to the system usually. In order to avoid these situations, the readers will read in snapshot isolation level on the read-only database. Means as the redo thread is performing its changes to the data, the secondary needs to keep at least the origin version of the data until the redone transaction got committed in order to provide an unblocked consistent view to the reading applications.
Snapshot isolation level is enabled at the moment the secondary is set to be readable as well. However in order for the secondary to become accessible all open transactions which got initiated on the primary and transferred on the secondary need to have opened with readability on the secondary enabled. That means if there is a transaction on the primary running for hours before committing, the ability to read from the secondary might be delayed for quite a while (see later).
What also got introduced are two different modes of read-only connectivity. In one mode all connections are accepted by the secondary replica, independent of the intent of the application. Means there is no further check or anything when establishing the connection. According to SAP OSS Note 1772688 – SQL Server AlwaysOn and SAP Applications we don’t support these kind of readable secondary replicas with SAP. Reason is that we under no circumstances would like to end up in a situation where parts of the connections end up somehow on a readable secondary.
The second mode of read-only connectivity is that the secondary replica only would accept connections which signal a read-intent. In those cases, the connection string needs to have a connection parameter like: ApplicationIntent = ReadOnly defined. More information around this mode in this article: http://msdn.microsoft.com/en-us/library/hh213417.aspx . If the application submits this connection parameter, one even can get special read-only-routing when one connects against the Listener name of the AwaysOn configuration. In case of connecting with Read-Only intent against the Listener, one first would connect to the primary in order to check whether there is some read-only-routing setup. If it is the case, the connection will be routed to the secondaries in the priority as those are defined (see later in this article). Thereby it doesn’t matter whether the secondary replicas are set to the Availability Mode Synchronous or Asynchronous.
When we look at the scenarios where SAP customers would like to have read-only access to a secondary replica, the advantages and disadvantages would look like:
- A reporting scenario against the most recent data would work excellently with the readable secondary of AlwaysOn. The data never gets stale. New changes are becoming visible almost instantaneously – No stale data.
- There is no additional disk space necessary compared to the snapshot solution with DBM
- One can protect from SAP Netweaver connecting to a readable secondary
- Scenarios where extracts need to work through several tables and consistency of content between tables is required, one still needs to work with Database Snapshots against a secondary replica
So let’s play through a few scenarios of using a readable secondary and how to avoid a disaster situation as described above with the DBM scenario.
First Step: Make the secondary readable
In order to make a specific secondary replica read-only use SSMS. In the Object explorer got to ‘AlwaysOn High-Availability’ à ‘Availability Groups’ and mark the Availability Group the Secondary Replica with the database you want to put read-only is part
Select Properties and you’ll see this screen:
In our case, we would like to have the synchronous replica becoming the target for our read-only application. We now change the settings like indicated in the next screen shot:
As mentioned, with SAP we would like to allow read-only applications using the read-intent parameters in the connections string only.
Second Step: Check whether the Secondary is readable with read-intent connectivity
In order to check whether the secondary is able to accept read-only connections, you now can connect with SSMS against the SQL Server instance running the read-only database replica and issue a read like:
select * from <sid>.SVERS
The failure message we would expect to get is:
Msg 978, Level 14, State 1, Line 1
The target database ('E64') is in an availability group and is currently accessible for connections when the application intent is set to read only. For more information about application intent, see SQL Server Books Online.
Since we didn’t instruct SSMS to submit the read-intent when connecting to the secondary database, it is now telling that we do not have the correct connection parameters set. This failure message would be expected when trying to connect with a SAP Netweaver application as well. If this failure message doesn’t come up it means that you did set the secondary unrestricted readable. Something we don’t support with SAP configurations.
Since you got the Query Window connected to the Secondary instance and you got the error above, click the right mouse key in the query Window and choose ‘Change Connection’ as seen here:
Choose Options as shown here:
In the appearing Dialog choose the tap ‘Additional Connection Parameters’ and add the string ‘ ApplicationIntent = ReadOnly’as seen below:
Now press ‘Connect’. This SSMS connection is now established with a read-intent. Hence the query issued against our read-intent readable secondary replica should succeed.
However if the little select, usually completing in a few milliseconds, is hanging. Investigate by issuing this statement:
select * from sys.dm_exec_requests
against the secondary replica instance. You will find at least one entry where the execution of the statement is suspended (column: status) and the column wait_type will show a value of:
This signals that there is at least one transaction open on the primary system which had been opened before you switched the secondary to become a readable secondary. As mentioned, when switching to a readable secondary, we need to provide committed versions before the Redo Thread is performing changes to the data. However with a transaction still open on the primary that has been open before switching to readable secondary, such a committed version of the data could not be provided on the secondary. Hence the select will wait until the transition is successful. The transition becomes successful, when all the transactions that were open before switching are committed or rolled back on the primary.
Another error message which came up could be:
Msg 976, Level 14, State 1, Line 1
The target database, 'E64', is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.
Would indicate that the database is not switched to any readable mode.
Step 3: Think about what how to connect the read-only application.
You can connect the application using the node name of the particular node\instance name of running the secondary replica. This is expected to be different than the name the SAP Netweaver applications are connecting against. Those would use the Listener Name to connect to the AlwaysOn configuration and would not at all refer to a single server name
Disadvantage of connecting the read-only application to a specific instance name would be that in case of a failover, there would be no alternative server and the application would need to be stopped and adapted in case of the secondary replica coming down.
You also could connect the read-only application by using the listener name and leverage read-only-routing. However in this case you really need to makes sure that:
- You set up the read-only-routing settings
- The connection parameters do include the read-intent parameter, otherwise the connections will end up connecting to the primary.
Let’s assume we got the same 3 node configuration as before. Now we would set the Readable Secondary setting of all the replicas to ‘read-intent only’. But that is not enough. For every one of those instances we now need to setup the order the read-intent-only connection should be routed for the case the specific node is in the primary role. If we don’t do that, the read-intent-only connections would be connected straight to the primary when using the Listener name. We need you to perform these further steps now:
- Define the READ_ONLY_ROUTING_URL for each of the nodes.
- Setup the READ_ONLY Routing table for each of the nodes
Let’s go the READ_ONLY_ROUTING_URL. You can determine that URL by running the script which is documented in this blog by Matt Neerincx: http://blogs.msdn.com/b/mattn/archive/2012/04/25/calculating-read-only-routing-url-for-alwayson.aspx
Then you take the URL which likely looks like: 'tcp://<Full Qualified Domain Name of the server>:<Port>'
And execute the T-SQL Statement like this on the primary server:
ALTER AVAILABILITY GROUP [AG name]
MODIFY REPLICA ON
N'<server name>' WITH
(SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'tcp:// ://<Full Qualified Domain Name of the server>:<Port>));
This needs to be repeated for every instance that should participate in the read-only routing.
If this is finished you need to setup the Routing table for every instance separately
In our configuration we run the three nodes sapdenali5, sapdenali6, sapdenali7. The SQL Command to define the routing table would look like:
ALTER AVAILABILITY GROUP [AG name]
MODIFY REPLICA ON
N'<server name>' WITH
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('<first server>','<second server>','<third server>',’…’)));
The number of instances which can specified is aligned with the limits of AlwaysOn in general. In our case the statement could look like:
ALTER AVAILABILITY GROUP [ERPPRODE64]
MODIFY REPLICA ON
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('sapdenali5', 'sapdenali6','sapdenali7')));
The routing table can be checked with this SQL query on the primary instance:
select ar.replica_server_name, rl.routing_priority, (select ar2.replica_server_name from sys.availability_read_only_routing_lists rl2 join sys.availability_replicas AS ar2 ON rl2.read_only_replica_id = ar2.replica_id where rl.replica_id=rl2.replica_id and rl.routing_priority =rl2.routing_priority and rl.read_only_replica_id=rl2.read_only_replica_id) as 'read_only_replica_server_name' from sys.availability_read_only_routing_lists rl join sys.availability_replicas AS ar ON rl.replica_id = ar.replica_id
In our case this query should come back with this result:
With this routing table, a read-intent connection should be routed to sapdenali6 if the primary is currently sapdenali5.
Only one thing to note: In order to make read-only-routing work, the connection string also needs to have a database defined which is part of the AlwaysOn Availability Group.
This effect can be demonstrated best with using sqlcmd.exe of SQL Server 2012. As described in this BOL article: http://msdn.microsoft.com/en-us/library/ms162773(v=SQL.110).aspx we added the –K option to define the ReadOnly Intent.
Let’s assume our account which we want to use to connect has the master database defined as default database and we try to connect with sqlcmd.exe like this:
C:\Program Files\Microsoft SQL Server\110\Tools\Binn>sqlcmd -E -SERPPRODE64vn
Where ERPPRODE64vn is the Listener name, we are ending up on the primary since we didn’t specify a read-only intent. However even if we specify the read-only intent like this:
C:\Program Files\Microsoft SQL Server\110\Tools\Binn>sqlcmd -E -SERPPRODE64vn -KReadOnly
We are ending on the primary server. Reason is that the account we are trying to connect with has the master database defined as default database to connect to. Hence if we add a database out of the AG related to the Listener name, we finally will end up on the secondary we defined first in the routing list:
C:\Program Files\Microsoft SQL Server\110\Tools\Binn>sqlcmd -E -SERPPRODE64vn -KReadOnly -dE64
As failovers are taking place and the instance which was the read-only target becomes primary, the read-only intent connections get terminated. As with normal application trying to leverage AlwaysOn, the read-only connections should have a connection retry. If the read-only application now reconnects, it will connect to the instance defined in the routing list of the instance which just became primary.
More detailed documentation about the read-only routing can be found in these articles:
What about the scenario of getting a secondary database fully accessible? A scenario where the database should be used for read and write. Exactly the case which we identified as a possible problem scenario with DBM.
As in DBM, we can’t move a secondary replica database to read/write as long as the database is part of an Availability Group. Hence as in DBM, we need to get the database out of the AG. We usually do this with dropping the replica out of the AG. This is done in SSMS Object Explorer.
In opposite to DBM are we supporting the connection of SAP applications to AlwaysOn configurations through the AlwaysOn Listener name only and not through the DBM connection string. Removing the replica from the AG, will also move the former replica out of the umbrella of the listener name and with that will make it unavailable for the SAP Netweaver application which uses the Listener name to connect. Hence a manmade mishap like the potential scenario described with DBM can’t happen anymore. Nevertheless we still would recommend an eventual rename of the database in order to avoid confusion.
Happy Holidays to all of you