Data Warehouse Import Best Practices
The following best practices apply to the Commerce Server Data Warehouse.
Always run the DTS tasks in sequence. Never run two DTS tasks (or multiple instances of one task) at the same time against the same Data Warehouse.
If two DTS tasks run at the same time, the data in the Data Warehouse will be corrupted.
You must run the Configuration Synchronization DTS task and the Report Preparation DTS task, regardless of which other import DTS tasks you choose to run.
For example, if you plan to import only profiling data so you can run reports about users, then you must run the Configuration synchronization task, the Profile data import task, and the Report preparation task. You do not need to run the other DTS tasks.
Run the DTS import tasks in sequence so fact data is imported first, and then dimension data.
Always run the DTS import tasks in the following order:
- Web server log import (fact data)
- Transaction data import (fact data)
- Product catalog data import (dimension data)
- Profile data import (dimension data)
- Campaign data import (dimension data)
For example, if you are importing only new data for campaign reports, you would run the Web server log import DTS task first, and then run the Campaign data import task.
If an import DTS task fails, or if you cancel an import DTS task that is in process, run the Data deletion DTS task.
You must run the Data deletion DTS task to delete the data that was successfully imported, and to return the aggregation class tables to their pre-import state. If you do not perform this step, the aggregations may be incorrect.
Do not run the Web server log import DTS task on a Web server that is in production. Run this DTS task on a separate computer dedicated for this purpose.
The Web server log import DTS task causes a lot of activity for the disk, SQL Server, CPU, and possibly the network. To minimize the impact on other resources, run the Web server log import DTS task on a separate computer, which may also have Analysis Services installed. Do not run it on your SQL Server computer.
To optimize the Web server log import process, set the StoreLogUser environment variable to False.
Set the StoreLogUser variable to false if you do not need to run the IP resolution DTS task, and you do not create any custom reports that use data in the LogUser table.
By setting the StoreLogUser variable to false, you can improve processing speed by as much as 50 percent. The LogUser table in the Data Warehouse is not populated during the import process.
To set the StoreLogUser environment variable
- On the Data Warehouse server, right-click My Computer, and then click Properties.
- In the System Properties dialog box, on the Advanced tab, click Environment Variables.
- In the Environment Variables dialog box, in the System variable section, click New.
- In the New System Variable dialog box, in the Variable Name box, type StoreLogUser, and then in the Variable Value box type F.
- Click OK three times to close all dialog boxes.
If you are running the Web server log import DTS task using a script, you must specify a 3 GB switch in the Windows 2000 Boot.ini file on the computer hosting the Data Warehouse (SQL Server) and the computer hosting Analysis Server.
Enabling the 3 GB switch requires two steps:
- Modify the Boot.ini file. For more information about the 3 GB switch, see Windows 2000 Advanced Server Help.
- Modify the executable running the log import DLL. Depending on how you import data into the Data Warehouse, you must modify the appropriate executable file.
For instructions, see Running the Web Server Log Import DTS Task for a Large Site.
Batch one day of your log files before you import them.
If you batch your log files on a daily basis and then import them, the import process will take less time than if you import your log files every hour.
Perform a Web log import just before running the Report preparation DTS task.
For example, if you import your Web log files hourly, import only few log files at a time. As the size of your Web log files increase, the performance in terms of gigabyte per hour will decrease.
After you import or delete data from the Data Warehouse, run the Report preparation DTS task.
You use the Report preparation DTS task to populate the online analytical processing (OLAP) cubes with the most recent site and user data stored in the Data Warehouse. Commerce Server 2002 uses OLAP cubes to organize summarized data stored in the Data Warehouse, allowing reports run from the Analysis modules to complete more quickly than they would without this additional step.
If multiple sites are sharing a Data Warehouse, run the Report preparation DTS task once to process data for all the sites.
When executed, the Report preparation task processes all the sites in a Data Warehouse. Do not run the DTS task separately for each site; it will impact performance of the Data Warehouse.
To prevent error messages when you run the Report preparation DTS task, disable the processing of objects for which there will be no data.
The Report preparation task may report errors when processing OLAP dimensions and cubes if the information necessary to populate those objects has not been imported into the Data Warehouse. The error message is displayed in the following format:
Processing error [Source data contains no rows] <object name>
If you are importing a subset of data into the Data Warehouse (for example, you are not running the Product Catalog data import task), you may want to disable the processing of objects for which there will be no data. This will prevent the "[Source data contains no rows]" error messages.
To disable the processing of Analysis Services objects, set the ProcessingEnabled flag to 0 in the CubeProcInfo and DimProcInfo tables for the cubes and dimensions you do not want to process.
Run the Report caching DTS task to optimize the performance of dynamic OLAP reports run from Commerce Server Business Desk.
After you run the Report preparation task, run the Report caching task to pre-execute reports against your Analysis server. When this is done, subsequent execution of the reports can be up to two orders of magnitude faster than the original report execution speed.
If you have a Web farm, collect Web server log files on a local computer before importing them into the Data Warehouse.
By collecting the files on your local computer that performs the Web server log import DTS task, you reduce the amount of traffic on your network. It is recommended that you implement the following procedure:
- Set up each Web server so it zips its own Web log files.
- Write a Microsoft Visual Basic Scripting Edition (VBScript) script to map drives, copy the zip files to a central location, and then unzip the files.
- Write a second VBScript that cleans the Web log files.
- Create a DTS "execute process" task that runs the first VBScript to copy and unzip the log files. The task should fail if any of the log files are unavailable or zipped incorrectly.
- Create a second DTS task that executes the VBScript to clean the unzipped log files.
- Run the Configuration synchronization DTS task and then the Web server log import DTS task.
If your deployment includes multiple Web servers, you can improve performance of the import process by tuning the environment variable MaxHitPerBuffer.
The MaxHitPerBuffer variable controls some of the buffer size in the multilog-datasource parser. The default for MaxHitPerBuffer is 6. You can adjust this setting to improve performance, depending on the number of servers in your deployment and the amount of traffic on your site.
For example, if you have 120 Web servers and you get a lot of unique visitors to your site, it may be optimal to set MaxHitPerBuffer to 5; any higher would consume too much memory. If you have 5 Web servers, you may be able to set MaxHitPerBuffer as high as 10.
Setting MaxHitPerBuffer higher than 10 is not recommended because it may use more than 1 GB of physical memory. If you do set MaxHitPerBuffer higher than 10, you should also enable the 3 GB switch in the Boot.ini file, and use the 3 GB enabled executables to do log import. For more information, see Running the Web Log Server Import DTS Task for a Large Site.
To determine the optimal value for MaxHitPerBuffer in your scenario, monitor the memory usage (private bytes and virtual bytes).
To perform a Web server log import from an Network Load Balancing Cluster, set up the server topology that will describe your cluster configuration to Commerce Server, and then run only one Web server log import DTS task.
In this scenario, you want log files from http://<site> to appear in only one report in Commerce Server. However, the Web server log files are in two or more locations, for example, \\server1\d$\iislogfiles\w3svc4 and \\server2\d$\iislogfiles\w3svc4. To set up the server topology, you can name the log files the same in each location as long as they are in the correct folder corresponding to the Web server.
For instructions about specifying the server topology, see Adding a Non-Commerce Application to a Site and Adding a Non-Commerce Web Server to an Application.
After you set up your non-Commerce application and Web servers, configure the properties in the Web server log import DTS task so each of the newly created Web servers point to the log files. The log files should be in separate folders, one for each server. (You can also do this when you create the servers.) For instructions, see Configuring the Web Server Log Import DTS Task.
If you deploy a new or updated catalog, campaign, or profile definition to your Web site, and then delete it the same day, the operation may affect the completeness of data imported into the Data Warehouse.
For example, you add the product "Green Book" to your catalog. This product gets the product ID of "X" within the catalog runtime database. You then publish the updated catalog to your Web site. A user shopping on your Web site adds one "Green Book" to the basket. As a result, the Commerce event "Add to basket" is logged to the IIS Web log file. One of the values in the Commerce Event is a product ID ("X"). The product ID references a row in a table in the catalog runtime database.
Normally the Data Warehouse import process resolves this product ID to a full description: the Commerce Event that had product ID "X" resolves as a foreign key to a catalog table. However, if you remove the item from the catalog database before the Data Warehouse imports it, the Web server log import task cannot resolve product ID "X" in the Web log file.
This type of "child data" in the Data Warehouse database that is separated from the "parent data" required to give it meaning is known as "orphaned data." For information about deleting orphan data, see Deleting Orphan Data from the Data Warehouse.
To improve profile data import speed, configure the start date and query interval values.
The Data Warehouse wizard uses the default start date of 01/01/2000 and the default query interval of fifteen minutes. If you import profile data using these default values, the Data Warehouse will try to read the data from 01/01/2000 00:00 – 00:15 and then 01/01/2000 00:15 – 00:30 and so on. Change the start date and query interval to values appropriate to the data you are importing.
Verify the Application Event Log after importing data.
Check the Application Event Log after importing data into the Data Warehouse. You use the Application Event log to confirm that an import completed successfully, or to view errors that occurred during the import process.