Data Structure Must Support a Tree

Article
04/05/2007

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Features
Using T-SQL to Generate a Resultset in Tree Form
Recursive stored procedures populate a tree from a SQL Server table

Rohit Nadhani, Vikash K. Agarwal

Storing and representing data in a hierarchical, or tree, form is a common requirement in database development and an elegant solution for many real-world programming problems. T-SQL provides a powerful way of rendering the rows of a resultset in tree form (i.e., indented to resemble a tree). Relational database management systems (RDBMSs) such as SQL Server inherently produce resultsets as a list of rows, but you can produce the same list as a tree—if you set up the correct schema and relationships. Here, we develop three stored procedures to build a tree from a SQL Server table. The client application calls one of the stored procedures, which uses the other two as needed. Then, an Active Server Pages (ASP) or HTML page uses the output from the stored procedures to produce an HTML list box without requiring any client processing.

Our T-SQL tree solution uses stored procedures, cursors, and global temporary tables (the names of which begin with two pound symbols—##). One of our examples also uses ASP or HTML to interface with the SQL Server output, although a Visual Basic (VB) or Visual C++ (VC++) client can use the same output. We used SQL Server Enterprise Manager to design the database schema and Microsoft Visual InterDev 6.0 to develop the stored procedures and the ASP page. (SQL Server Query Analyzer and Notepad are possible alternatives to Enterprise Manager and Visual InterDev, respectively.)

SQL Server features let you provide robust solutions to many complex problems. Tree generation is only one example of what you can do. We use the following SQL Server features in this article:

storing hierarchical data in a table structure or schema
using recursion in T-SQL stored procedures
using a global temporary table
making a solution data-tier scalable by using SQL Server's system process ID (SPID)
employing CURSOR OUTPUT VARYING to use the output resultset from one stored procedure in the parent stored procedure
using the sp_executesql system stored procedure to execute queries constructed on the fly

To present a resultset as a tree, your schema or table structure must let you process the data in hierarchical, or tree, form. The key to creating a tree is to define the parent-child relationship with no limit on the number of child, or nested, levels possible. For example, let's look at a simple project plan: A project contains n main activities, main activities have subactivities, and subactivities have further subactivities (or sub-subactivities). Consider the schema diagram in Figure 1. The following code creates the project activity table:

  CREATE TABLE project_activity_schedule (
	activity_code VARCHAR(10) NOT NULL PRIMARY KEY,
	activity_description VARCHAR(255) NOT NULL,
	planned_start_date datetime NULL,
	parent_activity_code VARCHAR(10) NULL
)

A unique code in the activity_code column identifies each activity in the project. The activity_description and planned_start_date columns provide more information about each activity. But the column that converts the table's structure from a simple list-type structure to a hierarchical structure is parent_activity_code. This column's type and width are the same as those of activity_code.

The parent_activity_code column uses a foreign key to reference the activity_code column in the same table; thus, activity_code and parent_activity_code have a recursive relationship, which the pipe in the upper left corner of Figure 1 depicts. Therefore, each activity that has a parent activity has that parent activity's code in the activity's parent_activity_code column. Activities without a parent have NULL in the parent_activity_code column. You can use Enterprise Manager's Database Designer to create this recursive relationship, or you can use the following statements after you've created the table:

  ALTER TABLE project_activity_schedule
ADD CONSTRAINT fk_parent_child_activity
FOREIGN KEY (parent_activity_code) REFERENCES
project_activity_schedule(activity_code)

The activity_code's data type depends on the project requirements. For simplicity, we stored only two attributes for each activity—the description and the start date—but you can use more attributes if you want.

Let's determine whether the structure of the data set in Table 1 satisfies the requirement for a tree with unlimited levels. (Although the schema provides unlimited nesting levels, SQL Server imposes a limit of 32 levels for nested stored procedures.) Of the three root-level activities, or nodes, A1 has no subactivities, A2 contains three subactivities, and A3 has two subactivities. These subactivities have A2 and A3, respectively, as their parent_activity_code. You can further extend this structure to 32 levels by specifying the appropriate parent code. Now, let's create the three stored procedures required to build the tree.

The Three Stored Procedures

The client application calls the topmost stored procedure, tsp_prj_activity_tree_list (Tree List), to create a unique, global temporary table. Web Listing 1 shows the Tree List procedure (for download instructions, see the More on the Web box, page 56). Then, the Tree List procedure calls the tsp_prj_activity_tree_build procedure (Build), which actually builds the tree by using recursion and feeds the tree into the temporary table. The Build procedure calls the tsp_prj_activity_child_list procedure (Child List), which issues a query to select rows from the table and returns the rows to the Build procedure. (Figure 2 shows the relationships among the three stored procedures.) Finally, the Tree List procedure extracts the tree from the temporary table and drops the table.

Why create the table by using sp_executesql rather than the CREATE TABLE statement? The answer lies in the word unique. Because you can invoke multiple instances of this procedure at one time, using only one table would lead to errors. All procedures would try to create the same table and feed data into it. Either the different procedures would fail at creation because the table already existed, or when one procedure terminated, it would drop the table, making all the other simultaneous procedures fail. Also, the data fed into the table wouldn't make any sense if the data came from various procedures because its context would be missing. Thus, each concurrent instance of the procedure needs its own unique table.

You can create unique tables if you add @@SPID to the static table name—in this case, ##prj_activity_tree_table—so that the final tables' names are similar to ##prj_activity_tree_table1 and ##prj_activity_tree_table2. SQL Server assigns a unique SPID to each logged-on session. You can use the @@SPID global variable to obtain this ID's value, which is an integer. Therefore, for each user who makes a connection and invokes the procedure, we add this SPID to the table name, making the name unique. Because you don't know this part of the table name beforehand, you must construct the query on the fly as a Unicode string, then use sp_executesql to execute the query.

Using a global temporary table and SPID lets the table persist across stored procedures and support simultaneous multiple instances in each stored procedure. (For information about the differences between local and global temporary tables, see the sidebar "Temporary Tables: Local vs. Global.") To get a list of all current connections and their associated SPIDs, you can run the sp_who system stored procedure. (For more information about SPIDs, see SQL Server Books Online—BOL.)

You use @gtemp\_table\_name to store the table name and @sql\_statement\_string to store the queries that you use to create the table, select records from it, and drop it. You enclose the entire operation between the BEGIN TRANSACTION and COMMIT TRANSACTION statements to ensure that the operation remains independent. Because the operation spans stored procedures and involves multiple WRITE operations, such as CREATE TABLE and DROP TABLE, you should ensure that WRITE operations either succeed or fail completely (i.e., that the procedure leaves no operation incomplete because of errors). Transaction control isn't mandatory here because SQL Server has an automatic mechanism for cleaning up global and local temporary tables. However, good programming practice dictates that you ensure completion in such contexts, and when you extend this example to build an expanding-collapsing tree, you must also maintain state, which involves a WRITE operation.

Let's follow a top-down approach to develop the tree list. First, you combine the static table name and the SPID to construct a unique table name. Then, prepare the CREATE TABLE statement using the unique table name and the necessary field structure. Second, to debug such constructions during the development stage, you issue a SELECT statement immediately after preparing the query or statement. This SELECT statement is for debugging only and selects and displays only the statement under construction. After you construct the statement, you execute it by using the sp_executesql procedure, which creates a global unique temporary table ready for use.

Third, you call the Build procedure and provide it with the correct parameters: the parent activity code (@parent_activity_code), the global temporary table name (@gtemp_table_name), and the indent string (@indent_string). The child procedure uses the parent_activity_code as the starting point. The child procedure begins by obtaining all the child activities of this parent activity. The first parameter contains a NULL value, which tells the child procedure to start with the root-level activities (i.e., activities that don't have a parent activity). The second parameter, which is the name of the table you just created, is where the called procedure feeds the tree recursively. The third parameter, which is the indent string, inserts spaces before the activity description to provide the treelike appearance. As the child procedure calls itself and moves deeper into the nesting levels, the number of spaces increases to properly depict the tree level.

The child procedure builds the tree and returns control back to the Tree List procedure. The activity_description field now contains the values of the columns to be displayed in the tree. Next, you build a simple SELECT statement in @sql\_statement\_string (you must build this statement because the table name is dynamic). The statement executes, and the client receives the SELECT statement's output as it would the output of any direct statement from a procedure.

Fourth, you prepare a DROP statement to delete the temporary table you created. The process is similar to other queries you have built and executed on the fly. And last, you commit the transaction.

Building the Tree

Let's look more deeply into how we collect and attach the tree nodes. The Build procedure builds the tree, node by node, and feeds it into the temporary table, as Listing 1 shows. This procedure takes the temporary table's name as a parameter. At any call, the procedure starts by having the activity code passed to it.

If you pass an activity code of A2 to the Build procedure, it fetches all of A2's child activities in a list. Then, the procedure parses the child activities, one by one. For each child activity, the procedure calls itself with that child activity as the parent_activity_code parameter; for example, you might have A2.1, A2.2, and A2.3 as child activities. Then, the Build procedure feeds A2.1 to the temporary table and calls itself with A2.1 as parent_activity_code; next, Build performs the same operations starting with A2.1. In other words, Build fetches all the child activities for A2.1, then feeds them, one by one, and calls itself again until it can't find any child activities for a given parent_activity_code. Build then starts returning up the tree. When Build reaches the level that was parsing the list of A2's child activities, Build moves to the next record (i.e., A2.2) and calls itself, again going deeper until it reaches the end of the node. This process is recursion. The procedure calls itself but preserves the state of the caller.

Let's look at examples of the actual parameters passed to this procedure, which Figure 3 shows. Note how the parent_activity_code changes and the size of the indent string increases. What does preserving the state mean? Calls 2, 3, and 7 occur within Call 1. When Call 2 returns control to Call 1, Call 1 can continue as if any normal instruction has finished executing. The call to itself didn't change any of Call 1's existing values or its state.

For example, Call 1 preserves the three-record list it had and continues with the next record, so A2 can execute Call 3. Similarly, Calls 4, 5, and 6 occur within Call 3. As these calls return control to the parent, the parent function continues with the next record. The parent function's state, in terms of execution sequence, remains unchanged. The only change that occurs is in the contents of the global temporary table. This change isn't automatic; to achieve it, you pass the same table name again and again.

The Build procedure starts by declaring variables for storing the activity attributes. Build uses these variables while it processes each activity in the child activity list. The @sql\_statement\_string variable is for storing Unicode SQL statements as strings. The @activity\_list\_cursor variable is crucial: Cursors let you store a set of rows, or a resultset, then process them individually. More specifically, cursors let you position at a row, retrieve a row or rows from a specific position, and update the rows. Cursors specifically work with one or more rows rather than sets, which is the typical way an RDBMS works. (For more information about cursors, see BOL.)

Next, you need to modify the length of the indent string in the @indent\_string parameter. This length determines the distance of the activity description from the left border of the list box that Figure 4 shows and, thus, creates the tree appearance. When the parent_activity_code is NULL, the Build procedure processes the root-level activities and initializes the indent string to a zero-length string; otherwise, the size increases by 10 space characters. Note that we used  , the HTML character code for a no-break space, instead of the normal space character because the output is in HTML. You can change the character to a normal space character if you need VB or VC++ output.

Then, you use the activity_code and cursor that you already declared to call the tsp_prj_activity_child_list (Child List) procedure, which Web Listing 2 shows. The procedure's job is straightforward; Child List returns a list of all child activities for the activity specified in the cursor passed to it. Note the use of the word OUTPUT in Build's execute statement, which signifies the direction of the parameter (i.e., whether the parameter is an input from the caller or an output from the called procedure). OUTPUT is a compulsory keyword that enables parameters to get values from stored procedures. When Child List returns control, the CURSOR parameter, @activity\_list\_cursor, contains the list of child activities.

You use a typical WHILE loop to parse each activity. The FETCH NEXT statement in the Build procedure fetches the row details, or activity attributes; this statement is the standard way of fetching rows from a cursor, one by one. The @@FETCH_STATUS system function is available after the FETCH statement executes. This system function indicates the success or failure of the FETCH. A value of zero indicates success. You break out of the loop using the BREAK statement if the FETCH has no more rows to process.

Next, a SET statement removes all the single quotes in activity_description. This action is a precaution because the next line constructs a query that inserts the activity details in the global temporary table. The presence of single quotes in the wrong places generates errors or syntactically incorrect statements-which problem depends on how the user wants to process the single quotes inside activity_description.

The next SET statement is a little more complex; it prepares an INSERT statement to supply the table's details. Remember that the temporary table has only two columns. The INSERT statement combines indent_string, activity_description, and planned_start_date into one string and feeds the string to the second column, which becomes the tree node (item values) that is visible to the user. The choice of content in each tree item is the user's option. For example, the user can ignore planned_start_date if the user doesn't need it, or the user can append only the date part of the column.

The next line contains a commented-out SELECT statement marked with the text For debugging only. This SELECT statement provides a fairly easy way to check query statements during the development stage. Remove the comment from the statement, and see what we mean. The statement now executes the same sp_executesql system stored procedure you've been using. The code inserts the activity into the temporary table.

The next step is crucial before you continue with the next activity on your list. You must get this activity's child activities by having the procedure call itself. This recursive process continues until it reaches the end of the node. When control returns to this point, execution continues on the next line. Then, the loop begins for the next activity in the cursor. Using the CLOSE and DEALLOCATE statements, respectively, you should close and deallocate the resources that the cursor used.

Select All Child Activities

The Child List procedure issues a SELECT query to obtain all child activities for the specified activity, as Web Listing 2 shows. If the specified activity is NULL, the procedure selects activities without a parent. The key point is the CURSOR VARYING OUTPUT statement following the second parameter. We explained the CURSOR and OUTPUT keywords above; SQL Server needs VARYING when the parameter combines CURSOR and OUTPUT types. You can feed the SELECT statement's output directly into the cursor using the SET statement, which SQL Server 7.0 introduced. When the OPEN command executes, it populates the cursor. Because the cursor's scope is the calling procedure, the resultset is retained and returned to the calling procedure.

The core procedures are ready. You can execute the Tree List procedure in Query Analyzer to get a tree-style resultset immediately. You must have the table and data ready before you proceed. (The script in Web Listing 3 will populate Table 1.)

ASP Client

You need an ASP or HTML page to use the stored procedures you developed. You can put the entire code inside the ASP page's BODY tag, as Listing 2, page 56, shows; the code requires Microsoft Internet Information Server (IIS) 4.0. First, the filename extension should be .asp to let IIS process the embedded VBScript code. Second, IIS 4.0 uses the folder inetpub\wwwroot as the default to look for the ASP or HTML pages. Therefore, this .asp file should reside in the inetpub\wwwroot folder, or you should configure IIS to use the folder in which you keep the file. Third, the <% and %> symbols signify that the code is in scripting language rather than plain HTML.

The ASP page contains two VBScript functions. The first function is a wrapper for the stored procedure. The user script calls the wrapper function without worrying about creating the connection or parameters. The wrapper function makes the necessary connection, uses the ADO command object, feeds any required parameters to the stored procedure, and returns the resultset. The wrapper function also automatically closes the connection and frees the resources. Using wrappers is good programming practice when your ASP project has hundreds of stored procedures with n parameters and calls from n pages. This approach ensures minimal redundancy, freed resources, and properly validated and fed parameters. The wrapper function uses two constants defined in the adovbs.inc file, which comes with IIS 4.0. The underlying values for adUseClient and adCmdStoredProc are '3' and '&H0004', respectively.

The second function, which Web Listing 4 shows, is an example of the building-blocks approach to programming. The function takes the resultset obtained from the wrapper function as input and builds an HTML <SELECT> statement as output. If you're not familiar with HTML, the <SELECT> statement outputs a list box in the browser. You use the <OPTION> tag to generate each item in the list box. The function loops through the resultset and constructs the <OPTION> tag, with activity_code as the underlying value and activity_description as the display text.

The best way to see the output is to view the page in a browser and use the browser's View Source option to see the underlying HTML source. If the resultset is empty, the function puts the text "(No data available)" in the list box. The block's function is generic and very useful. You can easily extend the function to include parameters that contain column names for both the display field and its underlying values. Now, you can display rows from any resultset as a list box in any HTML page with one function call.

Now that the blocks are ready, you issue two function calls, as Listing 3 shows, and your HTML page will show live SQL Server data in tree form. The first line that contains the Response.Write statement outputs a simple caption. The next line calls the VBScript wrapper function to obtain the resultset in the oRsPrjTree variable you have just declared. The second Response.Write calls the Build-ActivityTreeListHTML function with the resultset as input. This function returns the complete <SELECT> statement, which a typical Response.Write statement then sends to the browser. The process might seem lengthy, but after you understand the concepts and approach, the process will become more obvious.

Why Bother?

The key benefits of using recursive stored procedures to generate a resultset in tree form are speed, reliability, and scalability. We tried to implement a similar solution doing the processing and tree building in VBScript with raw data from SQL Server, and the solution was slower by more than 20 times. VBScript is, after all, an interpreted language.

We still favor the solution in T-SQL because T-SQL provides data caching, query optimization, cached execution plans, and temporary tables. Furthermore, using features such as SPID, which SQL Server provides and maintains internally, makes the solution scalable to the number of simultaneous users SQL Server and your hardware can support.

The solution has zero deployment. You don't have to make the client download or register a tree control to display the tree, an important factor in today's wired world. You can extend the solution to build a server-side expanding and collapsing tree, which is also deployment-free and also includes display improvements with the typical + and images of Windows Explorer or any other tree control.

Bugs, comments, suggestions | Legal | Privacy | Advertising