Data Points: Using XQuery, New Large DataTypes, and More

Article
10/07/2019

New information has been added to this article since publication.
Refer to the Editor's Update below.

Data Points

Using XQuery, New Large DataTypes, and More

John Papa

Code download available at:DataPoints0603.exe(116 KB)

SQL Server 2005 introduces a lot of new features, but it also enhances the popular and oft-used Transact-SQL (T-SQL) language. Changes include the introduction of new datatypes to store large values using the MAX indicator, the integration of enhanced XML querying and data modification with XQuery, and the new XML datatype.

In this month's column I will address these and other features based on some of my most frequently received questions regarding SQL Server™ 2005 and its T-SQL enhancements. All sample code, which runs on the AdventureWorks database that ships with SQL Server 2005, is included in the download file. I also included some extra examples in the downloadable code for your extended enjoyment.

Q I know I can store XML in the new XML datatype in SQL Server 2005, but how can I query parts of the XML without pulling all of it out of the database using an XmlDataReader? For example, how do I filter or pull a subset of XML nodes?

A The XML datatype is one of the brand new datatypes introduced to SQL Server 2005. It not only makes storing XML documents and fragments easier and cleaner than in previous versions of SQL Server, but it also provides mechanisms to interact with the XML data. SQL Server 2005 accomplishes the querying and modification of XML data by allowing the XML datatype's methods to use XQuery. XQuery is to XML what SQL is to relational data. (The XQuery language specification is located at www.w3.org/TR/xquery.)

The XML datatype has at its disposal a subset of XQuery already built in. The XQuery methods accept XPath expressions that can be used to navigate the XML. There are five built-in functions (shown in Figure 1) that interact with the XML datatype. In fact, the five functions are accessible right from an XML variable using the following sample format, where @myXml is an XML variable:

@myXml.Query(@myXPathExpression)

Figure 1 XML Datatype Functions

Function	Description
Query	Gets a set of nodes from an XML document/fragment (returns XML)
Value	Gets a single value from an element or attribute of an XML document/fragment (returns scalar value)
Exist	Returns a Boolean value indicating if the XQuery expression returns values
Modify	Changes values in an XML document/fragment
Nodes	Gets a context to a node based upon the XQuery expression

Some examples would help illuminate the use of these new features. In the next few examples I will use a few of the AdventureWorks database's tables, which contain XML columns, for that purpose. First, let's assume that you want to grab all of the résumés from a set of prospective job candidates who have experience with computers. You can retrieve values from an XML document (resumes) stored in the XML datatype using the query method, as shown here:

SELECT Resume.query('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; data(/Resume/Name/Name.First)') FirstName FROM HumanResources.JobCandidate WHERE Resume.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; /Resume/Employment/Emp.OrgName[contains(.,"World")]') = 1

This example accesses the HumanResources.JobCandidate table's Resume column, which contains the candidate's résumé in XML format. Notice that the SELECT clause refers to the Resume column and issues the query method directly from it. The query method accepts an XPath expression that in this case evaluates to use a specific XML Schema Definition (XSD) as its default namespace. (This XSD happens to be included in the SQL Server 2005 AdventureWorks database.) By using this XSD as the default namespace there is no need to prefix all of the nodes in the XPath expression with the namespace prefix.

This XPath is referring to the first name node and selecting it to be returned. By using the data clause, I am telling the statement to return only the contents of the selected nodes. The WHERE clause in this example uses the exist method to evaluate if the company name has the word "World" in its title. The exist method returns either a 0 or a 1 to represent false or true, respectively. This statement will return a list of all of the first names where the person previously worked at a company that had the word "World" in its title:

FirstName ---------------- Stephen Peng

Removing the word "data" from this example would change the results to return the entire node and not just its contents:

FirstName ---------------- <Name.First xmlns="https://schemas.microsoft.com/sqlserver/2004/07/ adventure-works/Resume">Stephen</Name.First> <Name.First xmlns="https://schemas.microsoft.com/sqlserver/2004/07/ adventure-works/Resume">Peng</Name.First>

So far I have touched on ways to query XML using the query and exist methods. You also can use the value method if all you are looking to retrieve is a single node's contents. The following is an example where I retrieve the list of e-mail addresses from the résumé pool:

SELECT Resume.value( 'declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; (/Resume/EMail)[1]', 'VARCHAR(100)') as email FROM HumanResources.JobCandidate WHERE LEN(Resume.value('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; (/Resume/EMail)[1]', 'VARCHAR(100)')) > 1

Notice the use of the value method in this example. I could have used the query method, but since I only want a single value and only its contents, the value function offers a bit cleaner and more efficient option in this case. For example, notice that the data keyword is not needed when using value since you are already referring to the contents of the nodes. The value method expects a singleton to be returned. In other words, it expects a single value to be returned so you need to specify which instance of the e-mail node you want to grab. If you have only a single row, it is most common to just code [1] to indicate the first instance. The value method also expects to be told to what datatype to convert the value. In this case I convert the e-mail address to a varchar(100). The query just shown returns the following results:

Shai@Example.com Max@Wingtiptoys.com Krishna@TreyResearch.net Stephen@example.com Tai@Example.com

You can also use the for, where, return, and order by statements to retrieve data. Figure 2 shows a command that embeds a for/return statement inside the query method's expression. In this case, the for statement iterates through the /Resume/Employment nodes. For each iteration, the current node is referenced by the variable $employer. Within the for statement you can add a where clause to filter the nodes you want to return data from as well. In this example I simply iterate through each Employment node and return the contents of its Emp.OrgName child node.

Figure 2 XQuery Using For and Return

SELECT JobCandidateID, Resume.query('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume"; for $employer in /Resume/Employment return data($employer/Emp.OrgName) ') Company FROM HumanResources.JobCandidate WHERE Resume.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume"; /Resume/Employment/Emp.OrgName[contains(.,"World")]') = 1

Taking a closer look at the WHERE clause of the SQL statement in Figure 2, you will notice that I use the exist method to limit the rows. Here I use the contains statement to look for any /Resume/Employment/Emp.OrgName node whose content contains the word "World".

There are several variations of these commands that you can use in combination. The SQL Server Books Online has a complete list of the functions and statements available to the new XQuery capabilities of SQL Server 2005.

Q How can I modify parts of the XML in an XML datatype without overwriting it?

A SQL Server 2005 exposes a modify method on the XML datatype which provides the ability to insert, update, or delete XML data within an XML datatype. There are three different statements that can be used with the modify method: Insert, which is used to insert new nodes into the XML, Replace Value Of, which is used to change the value of one or more nodes in the XML, and Delete, which is used to delete nodes from the XML.

Keep in mind that all of these statements are issued in a SQL UPDATE statement. For example, when inserting data, you are actually inserting more XML into an existing row's XML datatype column in the database. All three of these methods manipulate data within an XML column. Their usage is quite similar to the querying methods (nodes, exist, value, and query) in that they accept an XPath expression to locate the nodes to manipulate. The insert syntax is as follows:

xmlColumn.modify( 'insert your-content {as first | as last} into | before | after your-xpath-expression')

The insert keyword is followed by the content or XML nodes that you want to insert. Then, optionally you can specify whether the content should be inserted first or last with respect to the XPath expression. Next, specify whether the content will be inserted into, before, or after the node referred to by the XPath expression. Finally, specify the XPath expression that locates the node that you want to refer to for the insert.

It's time for an example again. In this case let's examine some code that inserts a telephone extension node and value for the job candidate named Shai Bassli. Figure 3 shows the code that will insert a telephone extension node and value of 555 within the first Telephone node for Shai Bassli.

Figure 3 Insert a Node

DECLARE @ext int SET @ext = 555 UPDATE HumanResources.JobCandidate SET Resume.modify('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; insert <Tel.Extension>{sql:variable("@ext")}</Tel.Extension> into(/Resume/Address/Addr.Telephone/Telephone)[1]') WHERE Resume.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works Resume"; /Resume/Name/Name.Last[. = "Bassli"]') = 1 The syntax to modify a node is as follow: xmlColumn.modify( 'replace value of your-xpath-expression with your-new_value ')

If you look closely at the code in Figure 3 you will see that the extension's value is set using a variable like sql:variable("@ext"). This is a way to embed a local variable or parameter defined in the T-SQL within the XQuery statement. In this case I am passing the value of the extension. This feature makes XQuery much more flexible than if it didn't allow parameters.

The replace value of keywords are followed by an XPath expression that is evaluated and resolved to a node whose value you want to replace. For example, I could locate and replace the e-mail address of a person. Following the XPath expression is the with statement and then the new value. Figure 4 shows an example of how to locate Shai Bassli's telephone record and replace the extension of 555 with 777.

Figure 4 Replace a Node

DECLARE @ext int SET @ext = 555 DECLARE @newExt int SET @newExt = 777 UPDATE HumanResources.JobCandidate SET Resume.modify('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; replace value of (/Resume/Address/Addr.Telephone/Telephone/Tel.Extension[.=sql:variable("@ext")])[1] with xs:int(sql:variable("@newExt"))') WHERE Resume.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; /Resume/Name/Name.Last[. = "Bassli"]') = 1

So far I have shown how to insert a telephone extension and modify its value. Now I will demonstrate how to delete the telephone extension from the XML without affecting the rest of the XML document, effectively returning the XML to its original state. The Delete syntax is as follows:

xmlColumn.modify('delete your-xpath-expression')

The syntax is pretty straightforward here; all you have to specify is the delete keyword and the XPath expression that resolves to the node that you want to remove. Be careful that you specify the correct node to remove because the delete command will remove the node you specify and all of its children. For example, if you specify /Resume as your XPath expression you will be deleting the entire résumé. OK, maybe that is obvious to you, but it is always good to be extra careful with deletes anyway.

The code in Figure 5 shows an example of how to delete (or remove) a node from an XML document (or fragment). Notice that I am deleting the node that has a telephone extension value of 777 and the last name of Bassli.

Figure 5 Delete a Node

DECLARE @newExt int SET @newExt = 777 UPDATE HumanResources.JobCandidate SET Resume.modify('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; delete (/Resume/Address/Addr.Telephone/Telephone/Tel.Extension[.=sql:variable("@newExt")])') WHERE Resume.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ Resume"; /Resume/Name/Name.Last[. = "Bassli"]') = 1

Q I've heard that in SQL Server 2005 I might not have to store my large string data fields in columns using the text datatype because of the new large datatypes. Where have these new datatypes been all my life?

A When SQL Server 7.0 was introduced, it raised the limit on VARCHAR fields to 8000 bytes. There were still issues, however, as many large string-based fields exceeded 8000 characters and had to either be broken into multiple rows, stored in a binary field (for example TEXT, NTEXT, or IMAGE) in the database, or even stored in a document in a file system. SQL Server 2005 blows the doors off the variable length fields' limits! The new MAX specifier can be used with the VARCHAR, NVARCHAR, and VARBINARY datatypes to declare variables that can store vast quantities of data with fewer limitations than the binary large object datatypes.

A variable declared as VARCHAR(MAX) or VARBINARY(MAX) can store 231 (or about 2 billion) characters. A variable declared as NVARCHAR(MAX) can store 230or about 1 billion) characters. Unlike the binary large object datatypes, these new MAX datatypes can be passed as parameters to stored procedures, just like the new XML datatype, and they can be indexed or even parsed by string functions. Now, of course, I won't go crazy and define all of my string columns as VARCHAR(MAX) in my tables. However, these new fields can be great for note-based columns. For example, assuming your application needs to store a diary of daily events where the entries are all text and can be a quick sentence or seven pages of text. In the past you could solve this problem several ways, such as by storing this data in a TEXT column, limiting the notes to 8000 characters in a VARCHAR(8000), or even splitting the text up into multiple rows behind the scenes—like the text nvarchar(4000) column in the syscomments table. None of these solutions were great. With SQL Server 2005 you can now define a column to be a VARCHAR(MAX) and it can store the entire text in it, and even be indexed.

Ah, but you might remember that in SQL Server 2000 the reason for the limitation was page size. This meant that table rows could not exceed 8060 bytes (in SQL Server 7.0 and SQL Server 2000). You could define multiple columns on a table that exceeded this limit, but then you would get a warning indicating that if the data later exceeded the limit, you would get an error. Often, this meant DBAs would try to split up tables so they would not hit the limit. In SQL Server 2005, rows can span multiple pages, so the 8060 limit is no longer applicable, and you can define several columns on a single table as MAX types, if needed.

[Editor's Update - 11/21/2006: Rows cannot span pages in SQL Server 2005, however portions of the row may be moved off the row's page so that the row can be very large. See msdn2.microsoft.com/ms190969.aspx for more information.]

Send your questions and comments for John to mmdata@microsoft.com.

John Papa is a Senior .NET Consultant with ASPSOFT and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. John, a C# MVP, has authored books on ADO, XML, and SQL Server, and can often be found speaking at industry conferences such as VSLive.

Additional resources