Document and content

Document and content are two options you can use in SQL Server 2005 when typing an XML column or variable with an XML schema collection.
To put it simply document means only XML documents should be accepted (exactly one top level element) while content means fragments should be accepted (0 or more top level elements). By default, if none of the 2 options is specified it is content that applies. The table below shows the syntax and the consequences for all 3 different possibilities

 

XML(myCollection)

Fragments (CONTENT applies by default)

XML(DOCUMENT myCollection)

Documents only

XML(CONTENT myCollection)

Fragments

This feature is in itself pretty easy to understand, and I wouldn’t have featured it on my blog if it didn’t have some interesting effects on… you guessed it… static typing.

First, we’re going to create an xml schema collection that contains one very simple schema.

CREATE XML SCHEMA COLLECTION test1 AS '
<schema xmlns="https://www.w3.org/2001/XMLSchema">
<element name="e" type="string"/>
</schema>
'
go

Now let’s see what the static type of expression /node() is when we use CONTENT and DOCUMENT. (If you don’t know how to obtain the static type of an expression please refer to this post by Mike Rorke)

DECLARE @var XML(CONTENT test1)
SET @var = '<e>data</e>'
SELECT @var.query('/node()')
go

static type is (element(e,xs:string) ? & comment ? & processing-instruction ?) *

DECLARE @var XML(DOCUMENT test1)
SET @var = '<e>data</e>'
SELECT @var.query('/node()')
go

static type is (comment ? & processing-instruction ?) * , element(e,xs:string) , (comment ? & processing-instruction ?) *

As you can see the results are very different. The main thing you should note is that in the second case, when DOCUMENT is used, the server expects one and only one instance of element ‘e’. It can be preceded and followed by zero or more intertwined comments and processing instructions, but there has to be one instance of element ‘e’, no more, no less. In the case when CONTENT is used there can be zero or more instances of ‘e’ intertwined with zero or more comments and processing instructions.

Now let’s try to run a simple DML query.

DECLARE @var XML(CONTENT test1)
SET @var = '<e>data</e>'
SET @var.modify('replace value of /* with "new data"')
go

DECLARE @var XML(DOCUMENT test1)
SET @var = '<e>data</e>'
SET @var.modify('replace value of /* with "new data"')
go

The results are once again very different.
The first query (where CONTENT is used) fails with the following error message
XQuery [modify()]: The target of 'replace' must be at most one node, found 'element(e,xs:string) *'
The second query (where DOCUMENT is used) on the other hand succeeds.

If you’ve read my post about singleton issues you’ve probably already guessed what the problem is. Success or failure in this case depends on the static type of XPath expression /*. When CONTENT is used, that static type is element(e,xs:string)*, which means that the expression could possibly return more than one node. When DOCUMENT is used, the static type is element(e,xs:string), which guarantees that the expression will return only one node.

Now, let’s modify our schema collection by adding a second element.

ALTER XML SCHEMA COLLECTION test1 ADD '
<schema xmlns="https://www.w3.org/2001/XMLSchema">
<element name="d" type="string"/>
</schema>
'
go

Let’s try to run our DML query again (still using DOCUMENT).

DECLARE @var XML(DOCUMENT test1)
SET @var = '<e>data</e>'
SET @var.modify('replace value of /* with "new data"')
select @var
go

This time the query fails and the server returns the following error message

XQuery [modify()]: The target of 'replace value of' cannot be a union type, found 'element(e,xs:string) | element(d,xs:string)'.

There are now two different elements that are eligible to be at the top level in the document. Therefore the addition of global element ‘d’ to our schema caused the static type of expression /* to change to element(e,xs:string) | element(d,xs:string)

To conclude, let’s look at what the static type of /node() has become in both cases now that the schema contains two global element declarations.

DECLARE @var XML(CONTENT test1)
SET @var = '<e>data</e>'
SELECT @var.query('/node()')
go

static type is (element(e,xs:string) ? & element(d,xs:string) ? & comment ? & processing-instruction ?) *

DECLARE @var XML(DOCUMENT test1)
SET @var = '<e>data</e>'
SELECT @var.query('/node()')
go

static type is (comment ? & processing-instruction ?) * , (element(e,xs:string) | element(d,xs:string)) , (comment ? & processing-instruction ?) *

When CONTENT is used, you can have as many instances of the two elements as you want. When DOCUMENT is used, you have to have exactly one instance of ‘e’ or exactly one instance of ‘d’, but you cannot have both. The static typing code takes into account which of the two options DOCUMENT or CONTENT is applied to the typed XML column or variable you're using.

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.