Tips for Handling Tricky TSQL Problems in Report Dataset Queries
Hey There – It’s been a while since I woke up my blog, and I've seen a pattern of questions in the forums and some of the report writers I work with, so I thought I'd post some suggestions for dealing with tricky TSQL problems in Report Datasets.
Types of problems I see people encounter
Most report writers start off their efforts with relatively straitforward SELECT statements, format it in a table and, voila!, a killer report. Then inevitably, a report consumer will respond with something similar to "Well, it looks pretty good, but I'd like to have it........", and the next thing you know, the reports have long lists of parameters, hundreds of lines of TSQL in report queries, spread across numerous dimensions of linked / dependent reports, parameter customized TSQL and so on; it can get out of control quickly. To be more specific:
Multi Value Parameters
A multi value parameter must be built into a list that is used as "IN (@parmname)" in a dataset query. When SQL Server executes the statement, it needs the list to look very neat and clean similar to (1,2,3,4...) or (a,b,c...). This works fine when you have a nice clean list of multi-parameter values like "1,2,3" or "a,b,c" but when you start populating the multi-value parameter with a query, and the query returns non-int datatypes, nulls, and strings that contain spaces, quotes, or commas, this can become a hairpulling, frustrating experience.
Large TSQL Batches
Several of my reports use queries that join 10+ tables, with derived tables, temp tables, cursors and so on. Given the size of the query, it can become a maintenance nightmare when I need to tweak it a bit. The report designers have pretty primitive TSQL query editors, and I always seem to forget all the places they need to be changed across numerous datasets and reports.
I often have a main report that shows aggregated information for a Group, it will have drillthroughs to a report aggregated for a Team, with another drillthrough to the details for an Individual. Considering that these three reports all use the same basic "Large TSQL Batch" I described above, and they only differ in the aggregation level of the query and the grouping design in the report, when a minor tweak needs to be made to the query logic, the maintenance nightmare is compounded by the number of affected/related reports.
Static TSQL Limitations
When the complexity of the report grows with many parameters and sorting/filtering logic requirements, static TSQL queries can be limiting and can start performing very poorly. For example, parameters can be used to simulate ad-hoc query logic similar to "show me the data for the last 7 days or for the last 24 hours", "I want to sort by columns a,b,c or by c,b,a", "I would like all the data between x and y, or > x, or <= y". Trying to support these types of ad-hoc logic in a static query can either require lots of "OR" logic (which is bad for getting good query plans from the SQL Server query optimizer), or may just be impossible.
Tips for dealing with them
Code Your Queries in SQL Management Studio, not in the Report Designers
I've spent a great deal of my database career living with the various TSQL interfaces and editors; first was the DOS prompt OSQL/ISQL command line editors ("They're all I'll EVER need"-yeah right), then the revolutionary Windows-based ISQLw editor that was basically Notepad with the ability to run the query, then the SQL Enterprise Manager allowed you to write queries and manage the database with a GUI, and finally today's SQL Server Management Studio (aka SSMS here) does all that with live, context-sensitive syntax verification, auto-suggested object names, and efficient handling of large result sets.
The reason I mention all this is to point out that my peers in the next building have built some very powerful query writing tools, and while my peers down the hall have built powerful report creation tools, the report designers are not meant to be powerful query writing tools. If you find youself coding a lot of large queries in the report designers, please, get into the habit of copying/pasting the report queries to/from the designer and SSMS. Once you get used to declaring/setting parameter values when running in SSMS, and then commenting them out in the designers, life will become much more pleasant for you.
Pre-Calculating Parameter Values in TSQL Variables
Report parameters are most commonly used in queries similar to
SELECT .... WHERE datecolumn < @maxdateparm"
In the "show me the data for the last 7 days or for the last 24 hours" example above, this could be coded in the dataset query as:
SELECT ... FROM ... WHERE
(@datenameparm = 'Days' and datecolumn > DATEADD(dd,@datevalueparm,GETDATE())
(@datenameparm = 'Hours' and datecolumn > DATEADD(hh,@datevalueparm,GETDATE())
However when you start OR'ing the queries like this they become much less efficient. An alternative would be to declare and set a TSQL variable in the dataset query outside the SELECT statement, and then use it in the SELECT statement, for example:
DECLARE @mindate DATETIME
SELECT @mindate =
CASE WHEN @datenameparm = 'Days' THEN DATEADD(dd,@datevalueparm,GETDATE())
ELSE DATEADD(hh,@datevalueparm,GETDATE()) end
SELECT ... FROM ... WHERE datecolumn >@mindate
Moving Queries into Views and Stored Procedures
Bear in mind that a dataset query is just a batch of TSQL statements that the report processing engine sends blindly to SQL Server. A long as SQL Server returns a result set, the report processing engine doesn’t know or care what is in the batch; it can be anything from a simple SELECT statement, to a complex script with expressions, temp tables, control of flow, insert/update/delete, database modifications and so on. Reports are generally intended for read-only data retrieval query batches, but a valid batch is a valid batch, and you can get really creative with them.
Once your dataset queries become larger, more complex, interdependent, and re-used in more places, it becomes more and more necessary to apply CS principles and abstract the complexity from the dataset queries.
SQL Server supports “Views” which are basically a packaged SELECT statement that can then be queried with SELECT as if it’s a table, and “Stored Procedures” that can contain any TSQL statements but must be called with the EXEC(UTE) statement. SQL also supports functions (static, dynamic, and table valued) and other programmatic constructs, but views and stored procedures usually fit my reporting needs.
In my “hierarchical-reports/queries” example above, I create a set of hierarchic views that are used in hierarchic reports. I start with the lowest level detail query that contains the bulk of the join and filtering logic along with the rest of the columns that will be used for higher level groupings:
CREATE VIEW PersonBugsView as
SELECT g.GroupName, t.TeamName, p.PersonName, b.Priority,‘BugCount’=count(distinct b.BugId)
FROM Bugs b
JOIN Persons p on p.PersonId = b.PersonId
JOIN Teams t on t.TeamId = p.TeamId
JOIN Groups on g.GroupId = t.GroupId
WHERE b.Status = ‘Active’ and b.Type = ‘Code Defect’ and b.Release = ‘Current Release’
GROUP by g.GroupName, t.TeamName, p.PersonName
Then I create the other hierarchical views on top of this detail view:
CREATE VIEW TeamBugsView as
SELECT GroupName, TeamName, Priority,‘BugCount’=sum(Bugcount)
GROUP BY GroupName, TeamName, Priority
CREATE VIEW GroupBugsView as
SELECT GroupName, Priority, ‘BugCount’=sum(Bugcount)
GROUP BY GroupName, Priority
Once the view are created and tested to be correct (in SSMSJ), I create the set of matching reports. The PersonBugsDetail report would have parameters for GroupName and TeamName, and the dataset query would be:
SELECT PersonName, Priority , BugCount
WHERE GroupName = @GroupName and TeamName = @TeamName
The TeamBugsSummary report would have a parameter for GroupName and the dataset query would be:
SELECT TeamName, Priority , BugCount
WHERE GroupName = @GroupName
And the report table would have a drillthrough action on the BugCount textbox that calls the PersonBugsDetail report with the GroupName and TeamName parameter values.
And finally, the GroupBugsSummary report would have a dataset that queries the GroupBugsView:
SELECT GroupName, Priority, BugCount
And again here, the report table would have a drillthrough action on the BugCount textbox that calls the TeamBugsSummary report with the GroupName parameter value.
The value of views here is that you could reuse them across several different reports, and when (not IF) the query logic needs to be changed, you could just change the detail view, and the effects would automatically be applied to all of the views and reports without having to edit each.
There are a few pitfalls and counter-arguments to this approach. Abstracting the logic away from the report makes them more complex; many of my report plagiarizers like to steal my queries or look at the logic used in the report queries, and they don’t know where the view is stored or how to look at it, this can be a good or a bad thingJ. Another argument is that you could communicate the same information in a single report on the detail query, and let the report table’s grouping expressions conditionally display the per-Group/Team information, and it will work fine for relatively small data sets. But if you’ve read my blog posting here http://blogs.msdn.com/deanka/archive/2009/01/13/pet-peeve-slow-reports.aspx, large datasets along with complex grouping can result in slow running reports, and I hate that.
Another suggestion for dealing with complex dataset queries is to use stored procedures, they can be useful in many situations. For example, as I mentioned in the above blog posting, report queries can run for a long time if they join many tables and require “read” locks on several tables that are heavily used by other processes reading and updating the data. In this case, you can split the operations into stages, store the information from each stage in temp tables or table variables, and return the final data from these tables. And many reporting needs cannot be fulfilled by SELECT statements (from views or tables) alone.
I used this method for a set of daily reports used by my team to communicate the labrun verification needs (and ensure that Reporting Services is bug freeJ!) for our group. Our labrun automation database is used continuously for hundreds of testers and lab machines, and hundreds of thousands of tests. In addition to that, my manager requested that the report data be sorted by the 1-n priority of the labruns, not an easy task with SELECT or report expression logic alone.
In this case, I had to split the query operations into chunks and store the data in temp tables, and write a cursor to loop through the data and assign a priority to each. The final TSQL batch was over 300 lines and had numerous temp tables, control of flow, date conversions and calculations and so on. I shudder at the thought of embedding this in each of the reports that need the same basic data.
Note that if you do use #temp tables in stored procedures used in reports, the query designers may give you an error similar to:
Msg 208, Level 16, State 0, Procedure TestProcedure, Line 3
Invalid object name '#temp'.
I’ve discussed the issues and solutions in a forum posting here:
but the net recommendation is to use table variables instead.
And finally, remember to GRANT the necessary SELECT or EXECUTE permissions on the views and procedures created for your reports; I’ve spent a lot of time testing views, procedures, and reports locally only to find they fail miserably on the server when the report’s datasource used a different login from mine, and that login didn’t have permissions.
Dynamic SQL Statements - EXEC()
Many reporting situations require ad-hoc query type logic that can be difficult or inefficient with static dataset queries. If the logic only affects the filtering of the data in the WHERE clause of the query, you may be able to get away with using complex OR constructs, but the query will be inefficient. If you need to have SORTing specified dynamically at runtime, you may be able to work with interactive or expression-based sorting in the report definition, but these have their limitations, complexities, and inefficiencies as well. In these cases, dynamic SQL statements can be used.
The basic trick is to build up the SELECT statement as a text string based on parameter values, and execute it using the TSQL “EXEC()” statement. The one reporting specific requirement is that the final result set has to have the same set of column names and datatypes with each permutation of your dynamic query. The number of rows, the values in them, and the order can be dynamic, but the names and types must be consistent.
For example, consider a situation where you want the data to be sorted by any combination of three columns, and you want the data to be filtered by any column name and value, all specified by the user at report run time. For a report based on the PersonBugsView mentioned above, you could define 3 report parameters for sorting and a set for filtering:
ParmName Available values
SortCol1 GroupName, TeamName, PersonName, Priority, BugCount
SortCol2 GroupName, TeamName, PersonName, Priority, BugCount
SortCol3 GroupName, TeamName, PersonName, Priority, BugCount
FilterColName PersonName, Priority, BugCount
FilterValue (dynamic, based on the FiltercolName specified)
DECLARE @sql varchar(1000) – be sure it is big enough to hold the whole query!
SELECT @sql =
‘SELECT GroupName, TeamName, PersonName, Priority, BugCount
WHERE ‘ + @FilterColName + ‘ ‘ + @ FilterOperator + ‘ ‘ + @ FilterValue + ‘
ORDER BY ‘ + @SortCol1 + ‘, ‘ + @SortCol2 + ‘, ’ + SortCol3
Print @sql – Note that this needs to be commented in the report but is good for debugging in SSMS
EXEC(@sql) – This will run the query, it better be syntactically correct!
There are a few very important considerations when using this technique, first and foremost is SQL Injection attacks; any time you dynamically build a query string based on user provided strings, you run the risk of a hacker figuring out a way to make the query do something far different (and worse) than you intended. There are a few mitigations to this attack; ie only allow available values you specify, not freetext strings, and ensuring that the login used by the report’s datasource only has limited necessary SELECT permissions in the database. But in any event, you should study up on this attack and understand the implications completely before using this technique.
Another consideration is that the dynamic statement must always be syntactically correct for all possible values specified by the users. In my code example above, the @FilterValue may need single quotes for strings and no quotes for numbers, and the statement would be invalid if any of the parameters send blank values to the expression.
And again, be sure to code and test this in SSMS using the PRINT statement to display the built query so you can see what is being executed, and be sure to comment out the PRINT statement when you’re pasting it into the query designer.
Using SQL Profiler to see what query is being sent to SQL Server
SQL Profiler is an extremely valuable tool for reverse engineering what queries an application is sending to SQL Server, and this applies to the report designer tools and the Reporting Services server applications as well.
SQL Profiler is somewhat of a “man in the middle” between the application and the SQL Server executing its queries. It works by registering a set of events you want to trace in SQL Server, and when those events happen, SQL Server returns information associated with the event. The key events in this context are “Stored Procedures-RPC Completed” and “TSQL-SQL:BatchCompleted”, and the event data returned for these is the SQL statement being sent by the application. In addition to events, you can also customize the columns of information returned by the events, and add filters to restrict the rows of data being returned. Fortunately when you start SQL Profiler (via the Tools menu option in SSMS), the default events, columns, and filters will be sufficient for your needs. If the report queries are run in a server with a lot of other query traffic, you may need to filter to just the login name used in the datasource login.
There are a few key situations in reporting where this is most useful. At the top of the list is working with multi-value parameters. As I mentioned above, SQL Server needs to see a nice clean list of values that it can execute like “…WHERE colname IN (1,2,3)”, but depending on the strings in your multi-value parameter list, it can be difficult to build a syntactically correct list of values that SQL Server can understand. In this case, you can build a report with a multivalue parameter and a datasource query that uses it in a “…WHERE colname IN(@parmname)” fashion, if it works fine, GREAT! But if you get errors, it can be really tough to figure out what the report designer is doing with your parameters and expressions without SQL Profiler. But you can also use it for any data set query that is returning errors for reasons you don’t understand.
I hope the helps, Thanks! - Dean
As usual, "This blog is provided 'AS IS' with no warranties, and confers no rights."