Configuration Manager Connector's DCM Rule can cause massive performance issues in Service Manager

IMPORTANT: Always perform a FULL Backup of the database before doing anything to it !!!

ALSO: It is advised that you open a case at Microsoft before doing this - directly editing the database is not supported and you may find yourself in an unsupported state if anything goes wrong.

 

There is a Rule which exists in the Configuration Manager Connector Management Pack which is called Incident_Desired_Configuration_Management_Custom_Rule.Update. This Rule can cause workflows (Subscription Rules) to lag behind a lot and cause the grooming jobs to fail, thus causing the EntityChangeLog table to get very large. In turn this causes in internal SQL Stored Procedure called p_EntityChangeLogSnapshot to take a lot of time to finish. This stored procedure is executed very often and while it is running, the performance of the Console is also impacted a lot.

This situation can occur if the Configuration Manager Connector will sync a *lot* of updates to the Microsoft.SystemCenter.ConfigurationManager.DCM_NonCompliance_CI instances. The Incident_Desired_Configuration_Management_Custom_Rule.Update Rule is configured to process these updates and this Rule is configured to run every 100 seconds and process *only* 30 update notifications (batch size is set to 30). So if there is a very large number of updates to process, this will cause this Rule to remain lagging behind because it will take too much time to catch up. This causes all other Rules to get stalled - but keep in mind - this Rule is processing, but it is processing very slow. As long as this Rule will not be able to finish processing the updates and lag behind, the grooming jobs (SQL Stored Procedure p_GroomSubscriptionSpecificECLRows) will not be able to clear the backlog (the update entries) from the EntityChangeLog table.

 

To generally check which rules are lagging behind and to particularly check if you are impacted by this issue, run this SQL Query on your ServiceManager database:

           NOTE: if you are using SM 2010, please delete this line from the following query:       AND W.IsPeriodicQueryEvent = 0

  DECLARE @MinState INT<br> SET @MinState = (<br>    SELECT MIN(State)<br>    FROM CmdbInstanceSubscriptionState AS W WITH(NOLOCK)<br>    JOIN Rules AS R<br>       ON W.RuleId = R.RuleId<br>    WHERE<br>       R.RuleEnabled <> 0 AND<br>       W.IsPeriodicQueryEvent = 0<br> )<br> SELECT<br>    R.RuleName,<br>    SM.TypeName AS SrcTypeName,<br>    RT.RelationshipTypeName AS RelTypeName,<br>    TM.TypeName AS TarTypeName,<br>    S.LastModified,<br>    O.OverrideName,<br>    R.RuleEnabled<br> FROM CmdbInstanceSubscriptionState AS S WITH(NOLOCK)<br> LEFT OUTER JOIN Rules AS R<br>    ON S.RuleId = R.RuleId<br> LEFT OUTER JOIN ManagedType AS SM<br>    ON S.TypeId = SM.ManagedTypeId<br> LEFT OUTER JOIN RelationshipType AS RT<br>    ON S.RelationshipTypeId = RT.RelationshipTypeId<br> LEFT OUTER JOIN ManagedType AS TM<br>    ON S.RelatedTypeId = TM.ManagedTypeId<br> LEFT OUTER JOIN ModuleOverride AS O<br>    ON O.ParentId = R.RuleId<br> WHERE<br>    S.State <= @MinState AND<br>    R.RuleEnabled <> 0 AND<br>    S.IsPeriodicQueryEvent = 0

 

Now if you see Incident_Desired_Configuration_Management_Custom_Rule.Update being present in the results, then you probably are impacted by this issue. To verify further lets check if we have a high number of updates for the Microsoft.SystemCenter.ConfigurationManager.DCM_NonCompliance_CI instances. To do this, run this SQL Query on the ServiceManager database:

  SELECT COUNT(*) AS 'Number of Updates to DCM Instances'<br> FROM EntityChangeLog AS ECL WITH(NOLOCK)<br> JOIN ManagedType AS MT<br>    ON ECL.EntityTypeId = MT.ManagedTypeId<br> WHERE MT.TypeName = 'Microsoft.SystemCenter.ConfigurationManager.DCM_NonCompliance_CI'  

 

Ok so we have a very high number of updates to these instances? ... Well maybe these do get processed fast enough ... lets also check how many of these *still* need to get processed because they were not processed yet. To check this, run this SQL Query on the ServiceManager database:

           NOTE: if you are using SM 2010, please delete this line from the following query:       AND W.IsPeriodicQueryEvent = 0

  DECLARE @MinState INT<br> SET @MinState = (<br>    SELECT MIN(State)<br>    FROM CmdbInstanceSubscriptionState AS W WITH(NOLOCK)<br>    JOIN Rules AS R<br>       ON W.RuleId = R.RuleId<br>    WHERE<br>       R.RuleEnabled <> 0 AND<br>       W.IsPeriodicQueryEvent = 0<br> )<br> SELECT COUNT(*) AS 'UNPROCESSED Number of Updates to DCM Instances'<br> FROM EntityChangeLog AS ECL WITH(NOLOCK)<br> JOIN ManagedType AS MT<br>    ON ECL.EntityTypeId = MT.ManagedTypeId<br> WHERE<br>    ECL.EntityTransactionLogId >= @MinState AND<br>    MT.TypeName = 'Microsoft.SystemCenter.ConfigurationManager.DCM_NonCompliance_CI'  

 

 Right so this is getting interesting, this is actually a *lot* of data (updates) coming in from the connector, so just out of curiosity, let us see how many updates to the DCM instances we actually get per day. To do this, run this SQL Query on the ServiceManager database:

           NOTE: if you are using SM 2010, please delete this line from the following query:       AND W.IsPeriodicQueryEvent = 0

  DECLARE @MinState INT<br> SET @MinState = (<br>    SELECT MIN(State)<br>    FROM CmdbInstanceSubscriptionState AS W WITH(NOLOCK)<br>    JOIN Rules AS R<br>       ON W.RuleId = R.RuleId<br>    WHERE<br>       R.RuleEnabled <> 0 AND<br>       W.IsPeriodicQueryEvent = 0<br> )<br> SELECT<br>    DATEPART(DAYOFYEAR, ECL.LastModified) AS 'Day of the Year',<br>    CASE ECL.ChangeType<br>       WHEN 0 THEN 'Add'<br>       WHEN 1 THEN 'Update'<br>       ELSE 'Other'<br>    END AS ChangeType,<br>    COUNT(*) AS InstanceCount<br> FROM EntityChangeLog AS ECL WITH(NOLOCK)<br> JOIN ManagedType AS MT<br>    ON ECL.EntityTypeId = MT.ManagedTypeId<br> WHERE<br>    ECL.EntityTransactionLogId >= @MinState AND<br>    MT.TypeName = 'Microsoft.SystemCenter.ConfigurationManager.DCM_NonCompliance_CI'<br> GROUP BY DATEPART(DAYOFYEAR, ECL.LastModified), ECL.ChangeType<br> ORDER BY DATEPART(DAYOFYEAR, ECL.LastModified), ECL.ChangeType 

 

 Well then, it seems that we have a *lot* of DCM Instances updates per day ... hmm interesting - so now do we solve this issue? Well we would need to disable the rule by creating an override, clear the Rule entry from the CmdbInstanceSubscriptionState table and then run grooming manually for a couple of times until we see that it finishes really quick.

Keep in mind that when doing this, it *may* take a very large amount of times (couple of hours even) until grooming will complete its clean-up. This all depends on how many entries it actually needs to groom. When you run the grooming, this should be ideally done over night or weekend and not during working hours because this will impact Console performance a lot and the *best* thing here would be that we should have no Console open until grooming finishes.

 

So, to solve this, follow these steps as described below:

1. import the attached Management Pack (DisableDCMRuleMPFix.xml) into Service Manager which will only contain an override which will disable the DCM Rule (if you are on SM 2010, then you may need to modify it or it will not work - or try to create one which is similar for 2010)

 

2. run this SQL Query on the ServiceManager database to delete the DCM Rule entry from the CmdbInstanceSubscriptionState table:

  DELETE CmdbInstanceSubscriptionState<br> WHERE RuleId = (<br>    SELECT RuleId<br>    FROM Rules<br>    WHERE RuleName = 'Incident_Desired_Configuration_Management_Custom_Rule.Update'<br> )  

 

 3. at this point, *NO* Console should be open, so please close all Consoles, now close all the System Center Service Manager services running on all the SM Servers and on the SM Data Warehouse Server and perform the next step only when you know that Service Manager is not being used by clients - so non-business hours would be best here 

 

 4. run this SQL Query on the ServiceManager database a couple of times until you see that it finished quickly - this will be the grooming stored procedure:

 EXEC p_GroomSubscriptionSpecificECLRows '55270A70-AC47-C853-C617-236B0CFF9B4C', 0, N'', 10000

NOTE: the last parameter for this stored procedure is the batch size - how many rows will be deleted in one iteration - you can increase or decrease this according to the number of entries which you notice from the other queries above - the higher the number of batch size, the faster will we get this finished - but don't set a too high number

 

 5. now once you are done, don't forget to start all the System Center Service Manager services running on all the SM Servers and on the SM Data Warehouse Server 

 

 

Well then, now things should get going again in Service Manager! :D

By the way - these actions/queries can be used for other rules as well that could end up in this bad situation. To check if maybe you are impacted by something like this, then go through this article: http://blogs.technet.com/b/mihai/archive/2012/07/13/service-manager-slow-perfomance.aspx

 

 

Happy hunting!

 

DisableDCMRuleMPFix.zip