Increasing MigXML efficiency

Increasing the efficiency of MigXML is all about making each individual include or exclude pattern as specific as possible and ensuring that the component operates in the proper context.  For example, the following component is extremely expensive to include in a migration:

<component type="Documents" context="UserAndSystem">
<displayName _locID="All Word Documents">User Data</displayName>
<role role="Data">
<rules>
<include>
<objectSet>
<script>MigXmlHelper.GenerateDrivePatterns ("* [*.doc*]", "Fixed")</script>
</objectSet>
</include>
</rules>
</role>
</component>

This component searches the entire directory structure of each fixed drive on the system for Word documents to include in the migration, in both user and system context (as covered in the post on context, this search will be executed n+1 times where n is the number of users on the machine selected for migration).  How expensive is it?  Well, lets try running Scanstate a few times to find out.

I added this component to the file expensive.xml and built the following command line:

ScanState.exe /i:expensive.xml /all c:\perfStore /c /o /i:miguser.xml /v:13

I then executed the above three times with the following variations:

  1. As above, no variations
  2. With the context of the All Word Documents component changed to be only System
  3. Without expensive.xml on the command line

On the same Vista SP1 machine with five user accounts that I ran the examples for the context migration post on.  I found the following results:

Test ScanState Runtime (sec)
(1) 264.5
(2) 217.6
(3) 186.1

The reduction in total runtime from (1) to (3) works out to be 30%!  Although I have not run this test on another machine, I can reasonably assure you that the runtime difference between each test will be even greater on a machine that has more user data than this one.  Also, in the case of the machine that I ran these tests on, the exact same set of files and settings migrate in each the these three cases despite the differences in XML and overall runtime.  This is because this particular machine only contains Word documents within each user's profile (eg, c:\users\tdolan\*).  Since MigUser.xml migrates all data under user profiles, the addition of the All Word Documents component won't add any files to the migration.  All adding it does is require ScanState to do more searching before determining that it has a complete set of files for migration.

So, what should we take away from this?  Writing efficient MigXML rules can increase migration performance.  This comes in the following forms:

  • Selecting the proper context for each component
    • For the most part, only components with user specific environment variables should be user context (eg, %CSIDL_MYMUSIC% and others as seen in MigUser.xml)
    • For the most part, only components that need to search the whole system should be in system context
    • UserAndSystem context should only be used when you are sure that User or System context alone isn't appropriate
  • Reducing the overlap among components.  Overlap among components (having the same files migrate from more than one component) reduces performance without impacting the migration.  This happened in the above example in two ways:
    • Placing the All Word Documents component in anything other than System context is a waste since doing so only results in the same search being executed multiple times
    • The All Word Documents component was unnecessary on my test system since the only Word documents it found were already migrating per MigUser.xml
  • Making patterns as specific as possible.  For example:
    • <pattern type="File">C:\* [*doc]</pattern> : This is about as bad as it gets (short of using the GenerateDrivePatterns helper as above).  ScanState must search the drive for ONLY .doc files and discard everything else.
    • <pattern type="File">C:\Data\* [*doc]</pattern> : This is much better since we have restricted the search from above to only be under C:\Data and its subdirectories.
    • <pattern type="File">C:\Data\* [*]</pattern> : From a processing perspective, this is the best.  There isn't any searching required; ScanState just knows that the entire C:\Data directory must migrate. 

Following these guidelines should help you write more efficient and MigXML rules that will yield better performance.