This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
Recursion in T–SQL
Recursion is one of the classic techniques all computer science majors learn, often by writing a "Towers of Hanoi" program. (For a trip down memory lane, see Q41154 for a QuickBasic 4.5 implementation of the Towers of Hanoi.) In this article, Alex Kozak explores recursion in T–SQL.
There was a period in my life when I taught college students. When I started the topic of subqueries, I'd ask them to find the youngest employee in Northwind's employees table. Most came up with a solution quite easily:
SELECT * FROM employees WHERE BirthDate = (SELECT MAX(BirthDate) FROM employees)
However, when I asked them to find the next youngest, many were stumped. A few offered a solution like this:
SELECT * FROM employees WHERE BirthDate= (SELECT MAX(BirthDate) FROM employees WHERE BirthDate < (SELECT MAX(BirthDate) FROM employees))
The recursive character of the query was obvious, and I can remember vowing to write a stored procedure that would return the nth member of any hierarchy (age, weight, points, and so forth). I eventually wrote the stored proc, but only two years later when I was working on an e–commerce project and someone was paying me to do it.
Recursion, which occurs when a procedure or function calls itself, is a well–known and powerful concept of mathematics and programming. However, it can be dangerous—leading to infinite loops, for example. (That's probably one reason that SQL Server 2000 limits the number of nesting calls or nesting levels to 32; you can use a global variable @@NESTLEVEL to check the nested level of the procedure at runtime.) DBMS programmers also need to remember that recursive calls can significantly add to transaction time, so recursion is usually avoided in OLTP systems.
[Bear in mind that terminology related to recursion in SQL Server can be confusing. Consider this from Q304364, PRB: Recursive Behavior of Triggers in SQL Server 6.5 is Incorrect: "The circuitous implementation of trigger recursion in SQL Server 6.5 should not be confused with 'indirect recursion' in SQL Server 7.0 or later. This circuitous implementation of a recursive trigger is called 'direct recursion,' whereas 'indirect recursion' in SQL Server 7.0 involves a trigger that updates another table whose trigger could update the original table, thus causing the original trigger to fire again. In SQL Server 7.0 and later, you can use the RECURSIVE TRIGGERS database option to enable or disable 'direct recursion.' You use the SQL Server NESTED TRIGGERS configuration to enable or disable 'indirect recursion.'"—Ed.]
Let's start with an example. Imagine that you have a table checkRank that stores the results of the students' tests, and that your task is to find the students that rank between 4th and 10th. To demonstrate the technique, I created a script fillCheckRank and a stored procedure spu_testRecursion. The script creates and loads table checkRank with relatively random data (see CreateCheckRank.sql in the Download file), but the stored procedure (see Listing 1) is more complex and uses a recursive algorithm, nested procedures calls, and nested subqueries to calculate the answer.
Listing 1. Stored procedure spu_testRecursion.
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id('spu_testRecursion') and OBJECTPROPERTY(id, 'IsProcedure') = 1) DROP PROCEDURE spu_testRecursion GO CREATE PROC spu_testRecursion @level int, @tblName varchar(30), @colName varchar(30), @answer varchar(8000) OUTPUT AS DECLARE @one_less int SET NOCOUNT ON –– Parameter @level greater than 31 is disallowed. IF (@level < 0 OR @level > 31) BEGIN PRINT 'Illegal Parameter Value. Must be 0 through 31' RETURN –1 END IF (@level = 0 OR @level = 1) BEGIN SELECT @answer= 'select max(' + @colName + ') from ' + @tblName END ELSE BEGIN SELECT @one_less = @level – 1 –– recursively call itself EXEC spu_testRecursion @one_less, @tblName, @colName, @answer output IF @@ERROR <> 0 RETURN(–1) SELECT @answer = 'select max(' + @colName + ') from ' + @tblName + ' where ' + @colName + ' < ' + Char(10) + '(' + @answer + ')' END IF @@NESTLEVEL = 1 BEGIN PRINT 'NESTED LEVEL ' + CAST(@@NESTLEVEL AS VARCHAR(20)) + CHAR(10) + @colName + ' rank ' + CAST(@level AS VARCHAR(10)) + CHAR(10) + @answer EXEC (@answer) END RETURN(0) GO /* How to run the procedure DECLARE @answer varchar(8000) exec spu_testRecursion 10, 'checkRank', 'testPoints', @answer output */
Note: When you drop and create the stored procedure, you may receive a message about not being able to add rows to sysdepends for the current stored procedure because it depends on the missing object 'spu_testRecursion'. But don't worry—the stored procedure will still be created. See Q24641 for more information.
The stored procedure receives these parameters:
- @level—Rank or position in the hierarchy.
- @colName—Column(domain) name.
- @tblName—Name of the table.
- @answer—Output parameter that returns the generated SELECT statement.
And it returns these two:
- Value, which corresponds to required level or position in hierarchy.
- A script that you can run to retrieve the same result.
To get the result for the 4th highest, you might do this:
DECLARE @answer varchar(4000) EXEC spu_TestRecursion 4, 'checkRank', 'testPoints', @answer output
Here are my results (yours may differ because data for table checkRank is generated pseudo–randomly):
NESTED LEVEL 1 testPoints rank 4 select max(testPoints) from checkRank where testPoints < (select max(testPoints) from checkRank where testPoints < (select max(testPoints) from checkRank where testPoints < (select max(testPoints) from checkRank))) ––––––––––– 93
So, a score of 93 corresponds to rank 4.
When I ran the same query for the 10th highest score, my answer was 87. The ultimate answer to the question of rankings for 4th through 10th can either be inferred by inspection, or determined by running a query (see 4ththru10th.sql in the Download file).
The next scenario is typical for many e–commerce marketplaces. Imagine that two sides—buyers and sellers—start trading, sending offers, or requests. Offers and requests can either be directed to a limited number of buyers (sellers), or can be posted and seen by all members of the exchange.
When the first bid or response to a request arrives, the negotiation starts. From here on, different scenarios are possible, and each scenario will create its own negotiation chain. Offers, bids, or any another part of the chain can expire, be canceled, be declined, or be accepted. Users can send a counter–offer, receive a counter–offer back, and so on. The cycle can start again as many times as defined by the rules of the market. Different deviations from the basic rules also might be allowed. For instance, you might want to allow parties to make limited changes to already closed deals, which, in turn, could be accepted or declined, and so on.
The physical implementations of your market might vary in the details, but usually each member of a negotiation chain is represented by a document—which might be saved as an XML document, a Java object, or broken up and stored in several tables. To find the sequence of the documents in a negotiation chain, you can use a document path. This is similar to a linked list where each member of such a list (path) has links to previous and next documents, except for the root and end documents in the list.
For example, assume that a table called Documents stores all the documents and has a column docPath. For a row with docID (primary key) = 12315 and a docPath = 12217/12267/12299/12315, there exists the next negotiation chain: 12217(the posted offer source that serves as the root document or template for the offer) à 12267(posted offer item—the actual offer) à 12299(bid) à 12315(counter—current document).
Now suppose I want to analyze the negotiation process, finding the differences between the prices, freight rates, volumes, and values in final and source documents. If I want to analyze the tendencies that cause the deals to fail, I have to consider the documents with statuses that are canceled, expired, or declined. To analyze the actual values and volumes, I need to take the agreements and purchase orders with a status accepted. In both cases, the final document will be the last document in docPath (in our example, 12315), but the source document won't be the first. In my example, the first document (12217) is a template, which has only basic parameters of the offer. (Only when I receive a bid am I able to calculate freight rate, total price, and some other parameters.) So in my example, the second document (12267) is a source. In more general terms, any document from the negotiation chain, except the last one, can be a source, because each subsequent document adds some new characteristics to the original document, and I might be interested in exactly those new parameters.
So my task is to extract the nth member of docPath according to some conditions—and this is a quite trivial task, if you write a script, stored procedure, or UDF, using T–SQL functions. But, if you want to get your result "on the fly" using a SELECT statement (think real–time e–commerce), the task becomes more complicated.
Substring helper routine
Imagine a sample–string '12/23/34/45/56/67/78/89/90/11/22/33/44/55/66/77/88/99/00/A/B/E/F/'. To find any member of this string, we can use a simple algorithm that extracts a substring between two subsequent positions of delimiter:
- Member (1) = substring (string, 1, pos(1) – 1);
- Member (2) = substring (string, pos(1) + 1, pos(2) – pos(1) – 1);
- . . .
- Member (n) = substring (string, pos(n–1) + 1, pos(n) – pos(n–1) – 1),
The T–SQL looks more like this (see Figure 1 for details):
- Member (1) = SUBSTRING (string,1,CHARINDEX('/', string,1)–1)
- Member (2) = SUBSTRING (string, CHARINDEX('/', string,1)+1,CHARINDEX('/', string, CHARINDEX('/', string,1)+1)–CHARINDEX('/', string,1)–1)
- And so on.
Stored procedure spu_findDocID (available in the Download file) generates the script that allows us to extract any member from the 1st to the 31st in the string. The procedure implements the algorithm I sketched out earlier and uses these parameters:
- @str—The name of the string, usually variable or column name.
- @level—This is actually a member's number or depth of recursion call.
- @prevPos and @pos—I use these output parameters to save positions of delimiters and use them in the next procedure call.
- @answer—One more output parameter to accumulate the result.
To see an example of the negotiation chain, run the script FindSource.sql. The first part of the script creates a "documents" table and loads it with sample data. These are the underlying rules of the scenario:
- If docTypeID of the first (left–most) document in docPath is 1, the document's source is the first document in docPath.
- If docTypeID of the first document is 2, the document's source is the second document in docPath.
- If docTypeID of the first document is 3, the document's source is the third document in docPath.
Then, using spu_findDocID, it generate scripts for the first, second, and third documents in docPath:
DECLARE @answer varchar(8000), @prevPos varchar(3000), @pos varchar(3000) EXEC spu_findDocID 'docPath', 1, @prevPos output, @pos output, @answer output EXEC spu_findDocID 'docPath', 2, @prevPos output, @pos output, @answer output EXEC spu_findDocID 'docPath', 3, @prevPos output, @pos output, @answer output
Finally, to see the source for all failed transactions (docTypeID 7 and 8), use this script, adding the generated scripts from before:
SELECT failed.docID [failedDoc], failed.docParh, frst.docID [firstDoc], frst.docTypeID [first docType], CASE WHEN frst.docTypeID = 1 THEN /*COPY GENERATED SCRIPT FOR FIRST MEMBER OF docPath HERE*/ WHEN frst.docTypeID = 2 THEN /*COPY GENERATED SCRIPT FOR SECOND MEMBER OF docPath HERE*/ WHEN frst.docTypeID = 3 THEN /*COPY GENERATED SCRIPT FOR THIRD MEMBER OF docPath*/ END sourceDoc FROM (SELECT docID, docParh FROM documents WHERE docTypeID IN(7, 8)) failed INNER JOIN (SELECT docID, docTypeID FROM documents) frst ON frst.docID = SUBSTRING(docParh,1, CHARINDEX('/', docParh,1)–1)
Here are the results of the query with my data:
failedDoc docParh firstDoc first docType sourceDoc 10 1/5/7/10/ 1 1 1 11 3/7/9/11/ 3 3 9 12 2/6/8/12/ 2 2 6
And there you have it!
To find out more about Microsoft SQL Server Professional and Pinnacle Publishing, visit their website at http://www.pinpub.com/html/main.isx?sub=57
Note: This is not a Microsoft Corporation website. Microsoft is not responsible for its content.
This article is reproduced from the September 2003 issue of Microsoft SQL Server Professional. Copyright 2003, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Microsoft SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-493-4867 x4209.