How to make ILM "2" scream.

Article
11/03/2008

One of the big features of ILM "2" RC are the changes we have made to the portal and server to enable ILM to scale higher. In the portal and in the ILM service we've tuned, and jiggered with the way we interact with SQL and on the wire to make the ILM experience faster and snappier. Out of this, we've both gotten a better idea of what it takes to deploy ILM in such a way to maximize performance as well as key pieces of knowledge to help administrators keep their deployment running zipper quick. I will be writing a series of posts outlining much of this knowledge over the coming days.

Hardware

If you are asking yourself which hardware to buy to deploy ILM, recognize that ILM is extensively CPU bound. Having a fast disk is essential, however the key gating factor is the CPU horsepower on your box. This is because of ILM's policy and rights calculations require SQL to execute multiple-statement operations when performing what are seemingly simple operations from the portal. Typically we use 4xQuad-Core boxes in our performance deployments.
In an organization of 50,000 users, 20,000 groups, you should expect your base database size to be about 7 GB or so. The growth of this database will be dependent on the frequency of changes executed on the ILM system. Any rig set up for ILM should have enough RAM to load this database entirely into memory.

SQL

SQL is where performance starts and ends in the ILM Service. It is critical to having ILM perform at enterprise scales that SQL is setup such that it can best serve the ILM application. You may wonder why we took the hard dependency on SQL Server 2008 in RC, and the answer is more than it's snappy new logo. SQL Server 2008 introduces a new feature called Filtered Indices which specifically aims to limit index pollution as well improve queries across sparse columns by selectively including values within specified indices. ILM weakly-typed, single-table based storage mechanism begs for the usage of this new capability and we did it. If you only have 1000 objects in your ILM store, well then this isnt much of a deal, but when you start scaling into the 50k+ arena, filtered indices come into their own.

Beyond this, deploying ILM requires a steady eye towards maintaining and monitoring ILM performance. Here are couple of tips off hand which should help you keep ILM screaming:

Turn off Automatic Statistics Updating. SQL uses a sampling technique by default to generate the statistics it uses for creating query plans. We've found that sometimes SQL grabs a bad sample and as such you will see queries all of a sudden fall off of a cliff. To prevent this turn off the automatic statistics update and manually run a fullscan statistics update on the ObjectsInternal table. Frequency of executing this will depend on the frequency of updates to the ILM database.
Pre-grow your ILM DB, TempDB and Transaction logs. Ensure that you max out these files from the get-go so you do not take the cost of incrementally increasing the files during normal operation.
Ensure you have seperated out the ILM DB, TempDB, and Transaction logs onto seperate drives.
Ensure that SQL has been set to have a fixed upper limit on the memory it can consume during operation. If left uncheck you will see SQL gobble away memory until it starts to adversely affecting the performance of both the OS and as a result ILM.

ILM Service:

The ILM Service itself is actually quite lightweight and very much dependent on the performance of SQL. To help streamline the service further you can try:

Ensure tracing is set to error level or is off completely. Running at an excessive tracing level is guaranteed to bring ILM to it's knees.
If you find certain queries are taking so long that the underlying SQL connections are timing out, you can use the dataReadTimeout and dataWriteTimeout attributes within the resource management service node in the resource management service configuration file to set the number of seconds for the underlying SQL timeouts.

That's all for now. Look for some further posts talking about other ways to monitor and manage ILM performance.

In the meantime, we are currently at Tech Ed Europe in Barcelona. You can come find me, or Nima, at the ILM booth located in the main exhibition area for the next couple of days. I will be running an Instructor Led Lab on ILM on Thursday at 1pm.

If you get a chance, I definitely recommend attending one of the many ILM sessions being done over the next couple of days:

Thursday: Identity Lifecycle Manager 2 (Part 3): Extensibility and provisioning with ILM 2 (Nima GanjeH)

2:40-3:55pm

Wednesday: Identity Lifecycle Manager 2 (Part 2): Expressing and enforcing business policy (Alex Weinert)

1:30-2:45pm

How to make ILM "2" scream.

Additional resources