Rules for Inferring Schema Node Types and Structure

This topic describes how the schema inference process translates the node types in an XML document to an XML Schema definition language (XSD) structure.

Element Inference Rules

This section describes the inference rules for element declarations. There are eight structures of element declarations that will be inferred:

  1. Element of simple type

  2. Empty element

  3. Empty element with attributes

  4. Element with attributes and simple content

  5. Element with a sequence of child elements

  6. Element with a sequence of child elements and attributes

  7. Element with a sequence of choices of child elements

  8. Element with a sequence of choices of child elements and attributes

Note

All complexType declarations are inferred as anonymous types. The only global element inferred is the root element; all other elements are local.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

Simple Typed Element

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded element shows the schema inferred for the simple type element.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<root>text</root>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="root" type="xs:string" />

</xs:schema>

Empty Element

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded element shows the schema inferred for the empty element.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<empty/>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="empty" />

</xs:schema>

Empty Element with Attributes

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded elements show the schema inferred for the empty element with attributes.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<empty attribute1="text"/>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="empty">

<xs:complexType>

<xs:attribute name="attribute1" type="xs:string" use="required" />

</xs:complexType>

</xs:element>

</xs:schema>

Element with Attributes and Simple Content

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded elements show the schema inferred for an element with attributes and simple content.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<root attribute1="text">value</root>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="root">

<xs:complexType>

<xs:simpleContent>

<xs:extension base="xs:string">

<xs:attribute name="attribute1" type="xs:string" use="required" />

</xs:extension>

</xs:simpleContent>

</xs:complexType>

</xs:element>

</xs:schema>

Element with a Sequence of Child Elements

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded elements show the schema inferred for an element with a sequence of child elements.

Note

Even if an element has only one child element, it is still treated as a sequence.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<root>

<subElement/>

</root>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="root">

<xs:complexType>

<xs:sequence>

<xs:element name="subElement" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

Element with a Sequence of Child Elements and Attributes

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded elements show the schema inferred for an element with a sequence of child elements and attributes.

Note

Even if an element has only one child element, it is still treated as a sequence.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<root attribute1="text">

<subElement1/>

<subElement2/>

</root>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="root">

<xs:complexType>

<xs:sequence>

<xs:element name="subElement1" />

<xs:element name="subElement2" />

</xs:sequence>

<xs:attribute name="attribute1" type="xs:string" use="required" />

</xs:complexType>

</xs:element>

</xs:schema>

Element with a Sequence and Choices of Child Elements

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded elements show the schema inferred for an element with a sequence and choice of child elements.

Note

The maxOccurs attribute of the xs:choice element is set to "unbounded" in the inferred schema.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<root>

<subElement1/>

<subElement2/>

<subElement1/>

</root>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="root">

<xs:complexType>

<xs:sequence>

<xs:choice maxOccurs="unbounded">

<xs:element name="subElement1" />

<xs:element name="subElement2" />

</xs:choice>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

Element with a Sequence and Choice of Child Elements and Attributes

The following table shows the XML input to the InferSchema method, and the XML schema generated. The bolded elements show the schema inferred for an element with a sequence and choice of child elements and attributes.

Note

The maxOccurs attribute of the xs:choice element is set to "unbounded" in the inferred schema.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

XML Schema
<?xml version="1.0"?>

<root attribute1="text">

<subElement1/>

<subElement2/>

<subElement1/>

</root>
<?xml version="1.0" encoding="utf-8"?>

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xml

ns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="root">

<xs:complexType>

<xs:sequence>

<xs:choice maxOccurs="unbounded">

<xs:element name="subElement1" />

<xs:element name="subElement2" />

</xs:choice>

</xs:sequence>

<xs:attribute name="attribute1" type="xs:string" use="required" />

</xs:complexType>

</xs:element>

</xs:schema>

Attribute Processing

Whenever a new attribute is encountered within a node, it is added to the inferred definition of the node with use="required". The next time the same node is found in the instance, the inference process will compare attributes of the current instance with the ones already inferred. If some of the already inferred ones are missing in the instance, use="optional" is added to the attribute definition. New attributes are added to existing declarations with use="optional".

Occurrence Constraints

During the schema inference process, the minOccurs and maxOccurs attributes are generated, for inferred components of a schema, with the values "0" or "1" and "1" or "unbounded". The values "1" and "unbounded" are used only when the values "0" and "1" cannot validate the XML document (for example, if MinOccurs="0" does not accurately describe an element, minOccurs="1" is used).

Mixed Content

If an element contains mixed content (for example text interspersed with elements), the mixed="true" attribute is generated for the inferred complex type definition.

Other Node Type Inference Rules

The following table describes the inference rules for processing instruction, comment, entity reference, CDATA, document type, and namespace nodes.

Node Type Translation
Processing instruction Ignored.
Comment Ignored.
Entity reference The XmlSchemaInference class does not handle entity references. If an XML document contains entity references, you need to use a reader that expands the entities. For example, you can pass an XmlTextReader with the EntityHandling property set to ExpandEntities as a parameter. If entity references are encountered and the reader does not expand entities, an exception is throw.
CDATA Any <![CDATA[ … ]] sections in an XML document will be inferred as xs:string.
Document type Ignored.
Namespaces Ignored.

For more information about the schema inference process, see Inferring Schemas from XML Documents.

See also