ACS Forwarders and High Availability - Part 3

In Part 1 of this series, I discussed the scenario where we have only deployed one ACS Collector along with one ACS Database.  No additional steps were taken to insure that the ACS Forwarder was able to continue to forward Security events in the event that the ACS Collector becomes unavailable.  I would highly recommend that you read at least the first part of Part 1 for some background information if you have found yourself directly on this post.

In Part 2 of this series, I discussed the scenario where we have deployed multiple ACS Collector and ACS Database pairs and we have relied on one of these to take over in the case of an ACS Collector failure.

So now the question becomes what if you need High Availability but you don’t want or need to have multiple ACS Databases.  That’s where this scenario comes in.

Scenario #3: One Database / Two Collectors in an Active/Passive Mode

Since Operations Manager SP1 was released we now have the ability to have two ACS Collectors point to one ACS Database as long as only one of them is active at one time.  What that means is that we can have a warm standby ACS Collector ready to go whenever a primary fails.

Setup

1. Install the primary ACS Collector as you normally would but make sure to use SQL Authentication and not Windows authentication.

If you use Windows Authentication you will be denied access when you attempt to bring up your standby ACS Collector.  You will notice that when you select Windows Authentication it doesn’t ask you what account you want to use.  That’s because it assumes you will use the computer account of the ACS Collector to connect.  Obviously this would break once the standby ACS Collector comes online.

See below for a screenshot of the AdtServer user that was created when I chose Windows Authentication. OMMS02$ is the computer account for my ACS Collector.

clip_image001

2. Once the primary Collector server has been successfully installed you will need to stop the “Operations Manager Audit Collection Service” so we can install the secondary Collector.

3. Install the secondary collector while specifying an existing database (the one created in Step 1) and again choosing SQL Authentication.

4. Stop the “Operations Manager Audit Collection Service” on the secondary Collector and start it again on the Primary Collector.  You may even want to set it to Manual on the secondary Collector just in case it tries to start again on a reboot.

5. This particular step is optional but highly recommended.  In order to minimize the amount of duplicate events that occur once the ACS Forwarders fail over to the secondary Collector we need to find a way to automate the process of transferring the ACSConfig.xml from the Primary server to the Secondary server.  Remember from Part 1 that ACSConfig.xml contains the recent sequence number which tells the Collector / Forwarder where things left off when inserting data in to the ACS Database.

Couple things to note about transferring this file:

     a. Try to transfer this file every 5 minutes as this is how often the file will be updated on the Primary Collector.

     b. Choose a method that does not lock the file as you do not want the file locked when the AdtServer service tries to overwrite the file.

6. Enable the ACS Forwarders by using the “Enable Audit Collection” task in the OpsMgr console.  This task requires an override configured that specifies the Collector for the Forwarder to communicate with.  Here you will enter a comma-separated list of Collector servers.  Notice that the secondary server appears first in the list and the primary server appears last.

clip_image003

Failing Over

If your Primary ACS Collector were to become unavailable you would now just need to start the “Operations Manager Audit Collection Service” on the secondary Collector server.  The AdtServer service will start up and read the ACSConfig.xml file we have been transferring over and take over the role of collecting Security Events from the ACS Forwarders.

Note: There will likely be some small amount of duplicate data that occurred due to data that was inserted in to the database between the time that the primary Collector crashed and the last time the ACSConfig.xml file was updated.

Failing Back

Once the primary Collector has been brought back online the process of failing back is quite easy.  First you’ll want to copy the ACSConfig.xml from the secondary Collector back to the primary Collector to minimize the duplication that will occur once we fail back.  Once that’s done you just need to stop the “Operations Manager Audit Collection Service” on the secondary Collector and the Forwarders will automatically attempt reconnection back to the Primary.

Pros

· Only one ACS Database is necessary

· Duplication of data can be minimized by synchronizing the ACSConfig.xml file between the Primary Collector and the Secondary Collector

Cons

· Some duplication of data may still occur

· Failover is not automatic and will require intervention (but could be scripted)

Additional Information

Check out Part 2 for information on how to check the configuration of an ACS Forwarder as well as what events to look for in the Event Log.