Access Subquery Techniques
This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
Even some experienced Access developers shy away from writing SQL directly. That's a shame, because, unless you're willing to write SQL, you can't use subqueries, which are a powerful tool for solving some especially thorny data retrieval problems. In this article, Mike Gunderloy introduces subqueries and shows how you can use them in Access.
So what's a subquery? That's easy: A subquery is an SQL SELECT statement that's nested inside of another SQL statement. You can use subqueries as part of a SELECT, SELECT INTO, INSERT INTO, DELETE, or UPDATE statement; in this article, I'll only cover subqueries in SELECT statements. But the big question is, "Why would you want to use a subquery?" I'll start out by looking at a practical example.
Using a subquery to calculate intervals
Suppose you'd like to know how frequently your customers place orders (this example, like all of the others in this article, will use the Access 2002 version of the Northwind sample database for its data). You can create a query to provide this information by following these steps:
- Create a new query based on the Orders table.
- Choose to show the CustomerID and OrderDate columns.
- In the Field row of the third column of the query, type this expression:
PreviousOrderDate: (SELECT MAX(OrderDate) FROM Orders AS Orders1 WHERE Orders1.OrderDate < Orders.OrderDate AND Orders1.CustomerID = Orders.CustomerID)
- In the Field row of the fourth column of the query, type this expression:
- Set the CustomerID field to sort ascending, and the OrderDate field to sort descending. Now run the query. Figure 1 shows the results. For each order you can see the date of the previous order as well as the number of days between the two orders. If you look at this query in SQL view, here's what you'll find:
SELECT Orders.CustomerID, Orders.OrderDate, (SELECT MAX(OrderDate) FROM Orders AS Orders1 WHERE Orders1.OrderDate < Orders.OrderDate AND Orders1.CustomerID = Orders.CustomerID) AS PreviousOrderDate, [OrderDate]-[PreviousOrderDate] AS OrderInterval FROM Orders ORDER BY Orders.CustomerID, Orders.OrderDate DESC;
The embedded SELECT statement in parentheses is a subquery. To be more precise, this particular example is a correlated subquery: one that uses a field from the main table as a part of the WHERE clause in the subquery.
A subquery can appear in the field list (as in the preceding example) or in a WHERE or HAVING clause, where it provides a set of one or more values to evaluate. In a field list, the subquery must return a single value, which is normally assured by using an aggregation such as MAX or SUM. In the WHERE or HAVING clause there are three basic forms for a subquery:
comparison [ANY|ALL|SOME] (subquery) expression [NOT] IN (subquery) [NOT] EXISTS (subquery)
The first form, using the ANY, SOME, or ALL keywords, allows you to filter a query based on the results of another query. For example, consider this query:
SELECT TOP 5 UnitPrice FROM Products ORDER BY UnitPrice DESC
That gives a result set with the five highest prices in the Products table:
Unit Price $263.50 $123.79 $97.00 $81.00 $62.50
Now, I'll use that as a subquery. First, here it is with the ANY keyword:
SELECT UnitPrice FROM PRODUCTS WHERE UnitPrice < ANY (SELECT TOP 5 UnitPrice FROM Products ORDER BY UnitPrice DESC) ORDER BY UnitPrice DESC
The result set for that query starts off:
Unit Price $123.79 $97.00 $81.00 $62.50 $55.00 $53.00 (more rows omitted)
Note that the $123.79 unit price is included, because it's less than any one of the rows returned by the subquery. The result might be more clear if you think of this using the equivalent SOME keyword (this returns exactly the same results as the previous example):
SELECT UnitPrice FROM PRODUCTS WHERE UnitPrice < SOME (SELECT TOP 5 UnitPrice FROM Products ORDER BY UnitPrice DESC) ORDER BY UnitPrice DESC
The alternative is to use ALL for the subquery comparison, which returns different results. The query is:
SELECT UnitPrice FROM PRODUCTS WHERE UnitPrice < ALL (SELECT TOP 5 UnitPrice FROM Products ORDER BY UnitPrice DESC) ORDER BY UnitPrice DESC
And the results start off:
Unit Price $55.00 $53.00 $49.30 $46.00 $45.60 $43.90 (more rows omitted)
Now, the $123.79 unit price is removed from the results (along with several others); the query returns only rows in the main table that are less than all of the rows returned by the subquery.
The second form of subquery syntax, using In or Not In, allows you to use a set of values (rather than a single value) in a WHERE clause. Suppose, for example, that you want to see the names of all employees who've sold anything to a particular customer. You can accomplish that with this subquery in the WHERE clause:
SELECT FirstName, LastName FROM Employees WHERE EmployeeID IN (SELECT EmployeeID FROM Orders WHERE CustomerID = 'ALFKI')
To see the employees who haven't sold anything to this customer, just replace In with Not In. In case you didn't know, In and Not In work perfectly well without subqueries as well. For example, this is a valid query:
SELECT FirstName, LastName FROM Employees WHERE EmployeeID IN (3,5,6)
Finally, the EXISTS and NOT EXISTS keywords let you make decisions based on whether there are any records at all in a subquery. For example, to find all products that have been ordered by a customer, you could use this query:
SELECT ProductName FROM Products WHERE EXISTS (SELECT Orders.OrderID FROM Orders INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID WHERE Products.ProductID = [Order Details].ProductID AND CustomerID = 'ALFKI')
More subquery solutions
Let's look at some other querying problems that are easily solved with the use of subqueries. One of these is the problem of getting a "top per group" result set. For example, suppose you'd like to see the most recent three order dates for each customer in the database. You can accomplish this task with this query:
SELECT CompanyName, OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE OrderDate IN (SELECT TOP 3 OrderDate FROM Orders WHERE Orders.CustomerID = Customers.CustomerID ORDER BY OrderDate DESC) ORDER BY CompanyName ASC, OrderDate DESC
Note that the WHERE clause in the subquery joins a field from the subquery with a field from the main query. This is what makes the subquery return different results for each customer in the main query.
Another use for subqueries is to add rankings to a totals query. For example, you might want to know the sales rank of each product. This is most easily done with two queries. The first is a totals query that collects the total sales for each product:
SELECT ProductName, SUM(Quantity) AS TotalSales FROM [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID GROUP BY ProductName ORDER BY SUM(Quantity) DESC
After that query has been saved as qryProductTotals, you can use this query with a subquery to generate the sales rankings:
SELECT ProductName, TotalSales, (SELECT COUNT(*) FROM qryProductTotals AS QPT WHERE qryProductTotals.TotalSales <= QPT.TotalSales) AS Rank FROM qryProductTotals ORDER BY TotalSales DESC
Figure 2 shows the results of running the second query. Note how duplicate sales figures are handled by assigning a tie rank. The query works by looking at a second copy of the source query and counting the number of rows that have a total equal to or greater than that of the current row. If you run this query, you'll discover that it's extremely slow, because it needs to run the nested totals query once for every row in the result set.
Finally, subqueries are very useful for answering "above average" questions. For example, which products cost more than the average product? Here's a query with a subquery that gives the answer:
SELECT ProductName, UnitPrice FROM Products WHERE UnitPrice > (SELECT AVG(UnitPrice) FROM Products) ORDER BY UnitPrice DESC
The path to SQL enlightenment
Subqueries are perhaps the simplest queries in Access that absolutely require you to write some SQL. Even if you use the QBE grid to construct your overall query (as I did for the first example in this article), you can't avoid writing an SQL statement for the subquery, either in a field definition or in a WHERE or HAVING clause. The requirement to write SQL makes many beginning Access developers shy away from using subqueries. That's a pity, because some problems (for example, the ranking query or the top per group query) are most easily solved by subqueries.
Rather than avoiding subqueries, I urge you to embrace them. Learning enough SQL to write subqueries will help you gain confidence in writing SQL statements, and ultimately you'll find that you can use this knowledge to write other types of queries directly in SQL. Access is practically unique as a product in letting you switch easily from a graphical view of a query to an SQL view. By learning how to make this switch on your own, you'll develop SQL skills that will serve you well in other less flexible products, as well as in writing VBA code that uses SQL statements.
To find out more about Smart Access and Pinnacle Publishing, visit their website at http://www.pinpub.com/html/main.isx?sub=57
Note: This is not a Microsoft Corporation website. Microsoft is not responsible for its content.
This article is reproduced from the October 2001 issue of Smart Access. Copyright 2001, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Smart Access is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-493-4867 x4209.