Lesson 2: Identifying Package Status and Handling Task Errors
Estimated lesson time: 35 minutes
A package run in the BIDS SSIS designer will show the progress of execution, including the status of tasks and containers. In the data flow, when a package is executing in the BIDS debug environment, you can monitor package progress through row counts as data flows through the data flow components and can see color changes that indicate the status. Overall, this progress functionality lets you easily test the packages and identify errors as you are developing.
In addition, the control flow allows precedence constraints that let you control the workflow steps in a package, specifying the tasks that should be run based on simple completion status (success, failure, or complete) and those that should use advanced evaluation criteria. This lesson looks at ways to monitor package execution status during development and ways to configure control flow constraints to handle errors and enable advanced precedence handling.
Viewing Package Status
While a package is running in the debug environment (that is, when you execute a package in BIDS), you are able to see the status of both control flow and data flow components. BIDS highlights tasks in different colors to help you see what is happening in the package:
Objects that are not highlighted when a package is running have not yet started.
Objects highlighted in yellow are in progress. In the data flow, you will also see the number of rows that have gone through source, transformation, or destination.
A task or data flow component highlighted in red has failed. Even after an error has occurred, other components in the package might still be executing.
IMPORTANT Allowing multiple errors in a package during execution
The MaximumErrorCount control flow property will let a package continue to run to completion, even after errors have occurred. The maximum number specifies the number of errors that can occur before the package will stop executing and report failure.
When components are highlighted in green, either the control flow task or container is complete or all the available rows have gone through the components successfully.
If you are using either a For Loop Container or a ForEach Loop Container, the embedded tasks might turn the various status colors at different times as the loops are performed. Even when all the tasks in a package have finished, the package will remain in the debug/executionstate until the package has been stopped. This behavior lets you easily see the final status of tasks and the row counts in the data flow.
In addition to the visual aids that help you see what a package is doing, while you are running a package in the debug environment, you are also able to read the execution details on a new tab in the designer called Progress. Figure 5-4 shows the Progress tab results of a package during execution.
Figure 5-4 You can view package execution details on the Progress tab.
When a package is executed outside of the debug environment, this tab's name changes to Execution Results. The listed Results include error and warning descriptions, execution times, final destination row counts, and other execution information (such as validation steps and configuration usage). Chapter 15, Securing and Executing SSIS Packages, reviews SSIS's logging features, which can capture this same information to a logging provider such as a file or database table.
Connecting Control Flow Objects with Precedence
Precedence constraints are the control flow connectors between tasks and containers. In fact, in the practices that you completed in Chapter 4 and in the first lesson of this chapter, you created simple precedence constraints by dragging and dropping the green arrows from one task to another. Most of the time, these connectors are green, red, or blue, indicating the execution success, failure, or completion, respectively. For example, if a task fails, then the constraints that are blue or red are evaluated as True. Even though a task failed, it is considered complete and therefore blue constraints (completion) are followed to the next task or container.
BEST PRACTICES Comparing control flow precedence constraints and data flow paths
Precedence constraints are only used in the control flow of a package and not the data flow. When looking at objects configured in the control flow and comparing these to components configured in the data flow, the objects look very similar, but they are very different. The connectors between objects in the control flow are precedence constraints, whereas the connectors in the data flow are data paths. Paths are discussed in Lesson 3, Handling Data Flow Errors and Debugging.
SSIS also provides advanced constraint capabilities, which allow conditional and/or expression evaluation criteria. Figure 5-5 shows a complicated control flow that has been configured with several precedence constraints.
Figure 5-5 A control flow configured with several precedence constraints
Notice that some constraint lines are solid, some are dashed, and some have function symbols (Fx) associated with them. Each of these indicates the implementation of a different precedence constraint feature:
- Solid lines represent logical AND precedence, meaning that a task will run only when all the constraint conditions are met.
- Dashed lines are logical OR statements between tasks. For examples, if a task has dashed constraints pointing to it, then the task will run when either of the precedence conditions is met.
- Function symbols show that an SSIS expression has been embedded in the constraint for evaluation. Embedding expressions allows constraints to determine task execution that is unrelated to preceding task results.
Within the control flow, you can edit the precedence constraint by double-clicking the constraint or by right-clicking the constraint and then choosing Edit from the drop-down list. Figure 5-6 shows the Precedence Constraint Editor.
Figure 5-6 The Precedence Constraint Editor
As you look at the editor, notice that there are two general groupings of properties: the Evaluation operation properties and the Logical condition properties.
Evaluation Operation Properties
The Evaluation operation properties let you define the conditions that determine the successful evaluation of the constraint. There are two primary evaluation indicators:
- Constraints are simply the execution status of the upstream task (completion, failure, or success), which you can select from the Condition drop-down list. For example, if an upstream task fails and the constraint is set to failure, then the constraint requirement is met.
- Expressions in the precedence constraints are advanced Boolean evaluators (meaning it must evaluate to True or False) that you can write in the SSIS expression language. Expressions can include mathematical and string functions and can reference package variables. When the evaluation operation allows an expression, you can type it in the Expression text box.
You can choose different combinations of these two evaluation indictors that will work together to determine whether the precedence constraint allows a downstream task to execute. The Evaluation operation drop-down list includes four choices:
- Constraint By choosing Constraint, you indicate that only the execution completion status will be used to evaluate the precedence constraint.
- Expression By choosing Expression, you indicate that only an expression will be used to evaluate the precedence constraint.
- Expression and Constraint If you choose Expression and Constraint, the precedence constraint will evaluate to True only if both the expressions evaluate to True and the execution status matches the constraint setting.
- Expression or Constraint If you choose Expression or Constraint, the precedence constraint will evaluate to True if either the expressions evaluate to True or the execution status matches the constraint setting.
All the properties mentioned above relate directly to the specific precedence constraint that you are editing. These choices give you granular control over the way a precedence constraint should be evaluated. Whenever you use an expression as the evaluation operation or as part of the evaluation operation, then a small function symbol will be displayed next to the precedence constraint line.
Logical Condition Properties
When you have multiple precedence constraints connected to the same task or container, then you also have the option to apply comparisons across the constraints. Your two logical condition choices are:
- Logical AND, which is the default and means that all the precedence constraints that are connected to the task or container need to meet their conditions in order for the downstream task or container to execute.
- Logical OR, which lets you indicate that only one of the precedence constraints needs to meet its requirements in order for the task or container to execute.
When you change the logical condition to either Logical AND or Logical OR, the change will propagate to all the precedence constraints that are connected to the same downstream task or container. When you set the logical condition to Logical OR, the precedence constraint lines become dashed for all the related precedence constraints.
Practice: Creating and Configuring Precedence Constraints
In the following practices, you will create precedence constraints to determine when your control flow tasks should run, and then you will observe the implications of defining precedence constraints during execution.
Exercise 1: Setting Up Constraints
Open the package you modified in Lesson 1.
On the Companion Disc
The installed practice files contain the completed SSIS projects for Lesson 1-3 under the Source\Ch 05 directory.
Drag and drop a Script Task onto the Control Flow workspace, but not inside the Sequence Container.
Drag the green precedence arrow from the Sequence Container onto the Script Task.
Right-click the precedence constraint you just created, and then select Edit from the drop-down list.
In the Precedence Constraint Editor, change the Value drop-down list to Failure and select OK to save the changes.
Edit the Script Task by double-clicking it and navigate to the Script property page on the left selection pane.
Click the Design Script button, which opens the Visual Studio code editor.
Add the following code where the editor says Add Your Code Here:
(MsgBox("Script Task ran!")
Be sure to delete the single quotation mark at the beginning of the line.
From the File menu, select Close and Return, and then select OK in the Script Task Editor window.
Save the package.
Exercise 2: Observing Package Constraints
- Execute the package that you modified in Exercise 1.
- Select OK when the Script Task message box window pops up, and then stop the package.
- Delete the Force Failure Execute SQL Task that is embedded in the Sequence Container.
- Rerun the package in BIDS; you will see that the Script Tasks did not run because the Sequence Container was successful.
- Right-click the precedence constraint that connects the Sequence Container and the Script Task, and then select Completion.
- Rerun the package, and when the Script Task message box pops up, select OK. You will see that the Script Task ran because the precedence constraint was configured to execute whether the Sequence Container succeeded or failed.
- Stop the package, and then save it.
- When a package fails while you are developing it, where should you look to identify what happened?
- You have a package that includes a step that occasionally fails because of network-connectivity problems. When a network-connectivity error occurs, you need to perform an alternative step to run the same operation in a slower but more reliable way. At the completion of the alternative step, you would like to run the next step in the original workflow. How can you accomplish this?
Quick Check Answers
- The Progress or Execution Results designer tab shows package execution details, including warnings that displayed or any errors that occurred during execution. Often, you will need to scroll through the results and look for the errors and their descriptions. A single error might produce multiple error messages.
- From the first task, create a red failure precedence constraint to the alternative task. Then you will need to create Success constraints from both the alternative task and the original task to the third task. You need to set the Success constraints to Logical OR so that when either the first task or the second task is successful, the final task will run.
Lesson 3: Handling Data Flow Errors and Debugging
Estimated lesson time: 45 minutes
In the real world, no data source is perfect. This means that you will need to handle anomalies and bad data as you process data in your SSIS data flow. Rather than having to fail a data flow, SSIS includes the ability to route the bad rows away from the data flow and handle the data problem without affecting the good rows. Furthermore, within the data flow, you can pause the execution and monitor the actual data as it is flowing through the SSIS engine.
In this lesson, we will look at using error paths to route failed rows to a different subset of components, and we'll review the way to use data viewers to debug data flows. We will also see how to use event handlers to take care of package errors and how to use breakpoints to debug the control flow.
Using Error Paths to Handle Data Flow Errors
Paths in the data flow are similar to precedence constraints in the control flow, except that data flow paths handle rows of data rather than the execution status of tasks. There are two primary paths in the data flow:
- Data paths are the green connectors that go from one component to another. For these paths, the rows that have successfully gone through a component are output, as are error rows when the error output is set to ignore failures.
- Error paths are the red connectors between data flow components, and they contain data rows that fail in a component, when the error rows are set to be redirected.
Not all components in the data flow use error paths. For example, the Multicast component only copies data; it doesn't perform any operation on the data itself, so there is no possible point of failure, and there are no error paths. Components that use error paths include all source adapters, destination adapters, Lookup transformations, Conditional Split transformations, Derived Column transformations, and so on.
Figure 5-7 shows a completed data flow containing a Lookup transformation with two outputs. The red error path handles rows that fail in the Lookup transformation operation.
In this example, when a matching row exists in the Lookup reference table, the row is sent out the error path. The error path is connected to an OLE DB Destination, in which the data is sent to a staging table for manual review.
Figure 5-7 A red error path indicates rows that fail in the Lookup transformation.
To use error paths, you will need to configure the error output. There are three error-handling options for handling errors in the data flow components:
- Setting the error output to Fail Transformation will cause the data flow to fail if an error is encountered.
- Using the Ignore Failure option will allow the row to continue out the normal green data path, but the value that resulted in the error is changed to a NULL in the output.
- Setting the error output to Redirect Row will send the error row out the red error path; this is the only way to handle errors with separate components.
These error-handling options are available for the entire row as well as for the operation for each column within the row. This doesn't mean that a single column gets redirected, but rather that some columns can be set to ignore failures while errors in other columns cause redirects. Figure 5-8 shows the Configure Error Output window, in which these properties are set. To navigate to this window, you can either double-click to edit the component and choose Configure Error Output, or you can simply drag and drop the red error path output onto the next component, which will open the same window.
Figure 5-8 Setting error-handling options on the Configure Error Output window
Let's look at an example of data being imported from a flat file. When a value that you are importing does not meet the data type specified in the flat file adapter, you need to handle the mismatch. To implement an error path for this common problem, follow these steps:
- Within your data flow, set up a new Flat File Source adapter connecting to a delimited file of some format.
- Drag a Row Count transformation onto the data flow workspace.
- Select the red arrow attached to the Flat File Source, and then drag and drop it onto the Row Count transformation.
- At this point, the Configure Error Output window will appear, which will allow you to determine what to do with the errors. Set the Error Output property to Redirect Row, and then select OK to close the window.
- Create a variable that can be assigned to the Row Count transformation. Now, when the data flow is executed, you will be able to count the errors that are not successfully imported into the pipeline from the flat file.
Leveraging Data Viewers to Identify Data Flow Issues
Troubleshooting data issues can be a frustrating experience, especially when you are not able to easily identify the problem row or issue. Therefore, SSIS also includes the ability to watch rows in the data flow as they are passing through the pipeline. SSIS implements this capability through data viewers. For any path in the data flow, you can add a data viewer that will pause the execution of the data flow and display the data in the data viewer in one of four formats.
You add a data viewer by right-clicking the path and then choosing Data Viewers from the right-click menu. Figure 5-9 shows the Data Viewer configuration window.
Figure 5-9 The Data Viewer configuration window
Once you are in the Data Viewer configuration screen, you need to choose the type of data viewer you want to use. Four types are available:
- Grids show the raw data in columns and rows; the grid is the viewer that is most commonly used to identify row-level data.
- Histograms are used for numeric columns and show the distribution of data across a range.
- Scatter plots reveal data relationships between two numeric columns, highlighting outliers.
- Column charts display the summary values of a column selected in the data flow.
After you choose the data viewer type, select Add, and then click OK to accept the defaults.
BEST PRACTICES Data viewers don't work when a package is run through the command line
You can use a data viewer only when running a package in the debug environment. If you have an error path configured in a package, it will be ignored when you run the package programmatically or through the command line.
The most commonly used data viewer is the grid, because it shows the actual rows of data. When you choose the grid and then execute the package in BIDS, the Data Viewer displays a set of rows (approximately 10,000) at a time. Figure 5-10 shows the Data Viewer window as the package is executing.
Figure 5-10 Error rows in the Data Viewer window as a package is executing
After reviewing the rows, you can choose to allow the data to move on until completion by selecting Detach in the Data Viewer designer, or you can return the next batch of rows to look at by selecting the green go arrow. If you use a data viewer on an error path, you can add the Error Column and Error Code columns to the output to identify the column in the row that caused the row to fail a component.
Handling Package Errors with Event Handlers
In the data flow, using data viewers provides the ability to easily debug problems while processing data. The control flow, however, is different because the focus is on workflow and execution rather than on data and transformations. Leveraging the capabilities in Visual Studio, the control flow supports visual debugging and breakpoint features. Let's first look at the event handlers that SSIS provides; then we will explore the debugging capabilities in the control flow.
SSIS provides the ability to listen for certain execution events and perform other operations when an event happens (depending on the execution event). For example, if an error happens, the error event handler can send an alert or potentially fix a data problem. Event handlers use the control flow paradigm for workflow processing, which includes all the same control flow tasks and containers that are found in the toolbox of the control flow.
You can define zero, one, or more than one event handler for a package. To add an event handler to a package, you need to select the Event Handler tab in the package designer. Creating a new package event handler requires that you select the executable and the event handler event, as Figure 5-11 shows.
Figure 5-11 Selecting the executable and event handler event for a package
The executable is the task or container scope that the event will fire. You can also choose the package itself (the highest-level container) as the executable for an event. The event handler event is the actual event that causes the event workflow to execute. The following table includes the package event handler types:
In addition, event handlers assigned to an executable scope will propagate down to child events when the event fires. If an event is assigned to a container, the child executables include the tasks and containers that are embedded within the parent container. This means that if you assign an OnError event to the package and an OnError event occurs at a task, the event handler would fire for both the task and the package (and for any containers in between). You would use an event handler for tracking error details, for sending failure messages in emails, and for implementing manual rollback logic.
BEST PRACTICES Capturing error information with the OnError event
Each package contains a set of system variables that are updated for the various levels in the package during the package execution. With event handlers, you can capture these variables and values, which provide contextual information, such as the ErrorCode, ErrorDescription, and SourceName (the task), when the event fires.
Using event handlers is a great way to track package execution; they can be used to audit the execution, capturing the errors that occur in a task. In addition, the event handler Send Mail Task can be used for notification; for example, it can notify an administrator of a certain predefined condition that requires a special response.
Event handlers can be turned off for any task or container by setting the Disable-EventHandlers property of the Task or Container to True. In other words, if you have an event handler defined, but you specifically do not want it to be invoked for a specific task, then you can turn off event handlers for that task only.
Debugging the Control Flow with Breakpoints
Package debugging lets you know what is going on during the execution of a package in the designer so that you can troubleshoot or validate processing logic. Control flow debugging involves setting breakpoints in the package, which will pause the control flow execution so that you can observe the execution state. SSIS takes advantage of the breakpoint functionality that comes with Visual Studio, which means you have the capabilities to view execution information about the package when you execute a package in the designer.
BEST PRACTICES Breakpoints work in control flow only
Breakpoints function in the control flow but not in the data flow. For scripting, this means that you can set breakpoints only in a control flow Script Task and not in a data flow Script Component.
To set a breakpoint, highlight the task or container, and either press F9 or navigate to the Debug/Toggle Breakpoint menu. You can set multiple breakpoints in a package, and you can embed a breakpoint within a Script Task at a line of code. Figure 5-12 shows a package that is running but is paused at execution.
Figure 5-12 A package that is running but paused
In this screen, the arrow next to the breakpoint icon indicates which task the package is currently waiting to run. When you are paused in the debug environment, you can do the following things to help troubleshoot your package:
- Open the Locals window to see all the variable values and the package status. You can find this window in the Debug toolbar, next to the package execution selections. If you have several variables in a package that you actively use to control logic and precedence, you can use a breakpoint to pause the execution, allowing you to troubleshoot variable values handling before the package execution completes.
- When you have completed working during a pause and are in a Script Task, you can continue the execution of the script and package to the next breakpoint by either clicking the Continue button in the Debug toolbar or by pressing F5. Alternatively, you can stop the package from continuing by clicking the Stop button on the toolbar. Breakpoints in a Script Task are very useful because they help you validate the code logic and branching that you are performing with the script.
- When the package is paused on a task (as opposed to within the code of a Script Task as described in the prior bullet) in the control flow, you can also continue running the package to completion (or to the next breakpoint) by selecting the Continue button on the Debug toolbar or by pressing F5.
- Stopping the package during execution, whether the package is paused at a breakpoint or executing, is handled by clicking the Stop button in the toolbar or by pressing the Shift and F5 keys at the same time.
In all, breakpoints are powerful tools for resolving errors and validating functionality. Combined with data views in the data flow, they provide comprehensive debugging support during your package development.
Practice: Identifying Data Flow Errors
In this practice, you will add an error flow path and then identify the error by observing the rows in the output through a data viewer.
Exercise: Adding Error Flow Paths
In SSMS, create a new database query connected to the AdventureWorks sample database and run the following code:
USE [AdventureWorks] GO TRUNCATE TABLE dbo.Sales_Summary GO ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT [CK_Summary_ProductLine] CHECK ((upper([ProductLine])='R' OR upper([ProductLine])='M' OR upper([ProductLine])='T')) GO
The above code uses the Sales_Summary table that you created in Chapter 4.
Open the package that you modified in the previous practices in this chapter.
Navigate to the Data Flow designer and open the OLE DB Destination adapter.
In the Data Access Mode drop-down list, change the selection to Table Or View, and then verify that the Sales_Summary table is still selected in the Name Of The Table Or The View drop-down list.
Click OK in the OLE DB Destination Editor to return to the package designer.
Right-click in the Data Flow workspace, and then choose Execute Task from the drop-down list to run the Data Flow Task. Before stopping the package, navigate to the Progress tab and observe that the OLE DB Destination failed because of a constraint violation.
Stop the package to return to design mode.
From the SSIS menu in the menu bar, select Variables; this will open the Variable window.
Select the leftmost icon in the Variable window toolbar to create a new variable for the package and name the variable ErrorCount.
Open the toolbar, and then drag a Row Count transformation onto the Data Flow work-space.
Highlight the OLE DB Destination adapter, and then drag the red error output path and drop it on the Row Count transformation.
When the Configure Error Output window appears, change the value in the Error Column drop-down list to Redirect Row, and then click OK to return to the Data Flow designer.
Open the Row Count transformation, and then change the VariableName property to
Click OK in the Row Count Editor to return to the Data Flow.
Right-click the red error path, and then select Data Viewers from the drop-down list. 16. Choose Add in the Data Flow Path Editor, highlight Grid, and click OK in the Configure
Data Viewer window and OK in the Data Flow Path Editor.
Right-click in the Data Flow designer workspace, and select Execute Task from the drop-down list.
A new Data Viewer will appear, which will reveal the OLE DB Destination Error Output.
Observe that the ProductLine column for all the error rows is S, which violates the defined constraint that you created in step 1.
Select Detach in the Data Viewer window, and then stop the package execution.
In SSMS, run the following database query, which adds S as a valid value of the Pro-ductLine column in the Sales_Summary table:
USE [AdventureWorks] GO ALTER TABLE dbo.Sales_Summary DROP CONSTRAINT [CK_Summary_ProductLine] GO ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT [CK_Summary_ProductLine] CHECK ((upper([ProductLine])='R' OR upper([ProductLine])='M' OR upper([ProductLine])='T' OR upper([ProductLine])='S')) GO
Return to BIDS and rerun the Data Flow, observing that the OLE DB Destination is now successful, with no rows being routed to the error path output or data viewer.
- A Data Conversion transformation is failing in the middle of the data flow execution, and you need to determine what is causing the error. How should you proceed?
- Your package contains a string variable that you are updating, using a Script Task, to be a file path and file name. Your package is failing at a File System Task that is configured to use the variable to move the file to a different folder on the server. How do you troubleshoot the package?
- You would like to log all the package errors to a custom database table you have created for auditing purposes. How can you accomplish this task?
Quick Check Answers
- To determine what is causing the error, configure the Data Conversion transformation error path to Flat File so that any rows that are failing conversion are sent out to a file. Then, create a data viewer on the error path, and run the package in BIDS. This technique will capture the errors in a file and display the rows in the designer for troubleshooting.
- Because the Script Task can contain embedded breakpoints in the code, set a breakpoint in the script so that you will be able to execute the package and step through the lines of code, observing the value of the variable to check the code and accuracy.
- By using the OnError event handler assigned to the package level, you can also use an Execute SQL Task that calls a stored procedure, passing in the SourceName and ErrorDescription variable values. The procedure can then track these details into a metadata storage table for auditing.
© Microsoft. All Rights Reserved.