Troubleshooting the Data Warehouse: An overview
I know that troubleshooting the data warehouse can be a bit intimidating. Sometimes even figuring out where to begin to look can be difficult. In this series of blog posts, I am going to help you figure out how to troubleshoot the data warehouse. I’ll use this post as an index with links to all of the subsequent topics.
Where to begin?
Depending on what kinds of issues you see, your start point might be different. Here I am going to talk about the general approach.
If you observe a failure or something not too right, generally, there are a few places we can start our debugging.
1. Start with the Operations Manager event log on the Data Warehouse server
This is the absolutely most important place to begin! Almost all of the errors from the Data Warehouse are output to this event log. Events in this log will have two different sources: Deployment and Data Warehouse.
Events with a source of Deployment are usually from management pack deployment which includes report deployment or building out the data warehouse (like creating outriggers, dimensions, fact tables, etc). If you see an error in the event log, it usually has instructions on how to recover from it. For example, it may indicate you need to restart the services. In the Data Warehouse Management Server, there are three services just like on the Service Manager Management Group server:
· System Center Data Access Service,
· System Center Management,
· System Center management Configuration
It’s usually best to recycle all of them at the same time.
Once your data warehouse is deployed, events are more likely to have a source of Data Warehouse. These events are written by jobs within the normal course of operations like the ETL jobs, the MPSync job and the DWMaintenance job. However, these all still write their events to the Operations Manager event log.
2. Data Warehouse Workspace in the Service Manager console
When you click on the Data Warehouse Jobs in theData Warehouse workspace, you should see the ETL jobs and MPSync Job status. If your deployment was successful and your data warehouse is correctly registered to at least one Service Manager management group, you’ll see at least 5 jobs. Every job should either be in a status of “Running” or “Not Started”.
If you see a job is in Failed status, you can click on this job and then click on Modules from the Tasks pane to find out which job module has failed. However to find out why it failed, the best place to get information from is the Operations Manager event log on the Data Warehouse server. You can also get more details from Powershell cmdlets which I will talk about in the next step.
In the Data Warehouse workspace, you can also click on the Management Packs linkinthe left pane. Here’s where you can see all the management packs in the Data Warehouse and the status of their deployment. When you import a management pack to Service Manager, the MPSync job will synchronize it to the Data Warehouse (hence the name…MPSync means “management pack synchronization”). When you get the list of management packs in the data warehouse, you can find out if your management pack has been deployed successfully or not.
If your management pack has defined data warehouse specific elements such as outriggers, dimensions, fact tables or reports, this Management Pack must be successfully deployed before the new tables and reports will be ready to use.
The Powershell cmdlets provide much more detailed information about the Data Warehouse jobs than the console does. This will likely change in a future release, but for now I really recommend you learn how to use the following useful cmdlets:
This command tells us the sources which are registered with DW currently. We expect to see at least two different DataSourceName.
This command tells us all DW jobs status in the current batch.
In the above screenshot, you can check whether the jobs are enabled or not (see the highlight), which jobs are running and when they started.
When MPSync Job or DWMaintenance job start, they both disable all of the ETL jobs so you will see the IsEnabled column set to false for each of the ETL jobs. This means that even if the ETL job status shows it is running, it actually isn’t. When MPSync Job or DWMaintenance job complete the ETL jobs will automatically get enabled and resume processing.
Jobs are normally in the “Not Started” status and if it is you can assume the previous batch has completed. If you’d like, you can use the following command to view the latest few batches of a specific job.
Get-SCDWJob –JobName <Specific job name> -NumberOfBatches <number>
In the above screenshot, you can see the latest MPSyncJob completed, when it started and when it ended. If you’d like you can calculate how long it ran, and what the next batch id and status is. The job batch id is always incremental.
This command provides detailed information about the specific modules within the job. This is very useful when you see job failed and want to find out what caused the failure.
List of common issues and how to troubleshoot them
This list is not exhaustive, but it does cover most of the common issues customers have run into. This list will get added to over time so feel free to check back if you run into a new issue: