Sometimes you have to read arbitrarily large XML files, and write your application so that the memory footprint of the application is predictable. If you attempt to populate an XML tree with a large XML file, your memory usage will be proportional to the size of the file—that is, excessive. Therefore, you should use a streaming technique instead.

One option is to write your application using XmlReader. However, you might want to use LINQ to query the XML tree. If this is the case, you can write your own custom axis method. For more information, see How to: Write a LINQ to XML Axis Method (Visual Basic).

To write your own axis method, you write a small method that uses the XmlReader to read nodes until it reaches one of the nodes in which you are interested. The method then calls ReadFrom, which reads from the XmlReader and instantiates an XML fragment. You can then write LINQ queries on your custom axis method.

Streaming techniques are best applied in situations where you need to process the source document only once, and you can process the elements in document order. Certain standard query operators, such as OrderBy, iterate their source, collect all of the data, sort it, and then finally yield the first item in the sequence. Note that if you use a query operator that materializes its source before yielding the first item, you will not retain a small memory footprint.

## Example

Sometimes the problem gets just a little more interesting. In the following XML document, the consumer of your custom axis method also has to know the name of the customer that each item belongs to.

<?xml version="1.0" encoding="utf-8" ?>
<Root>
<Customer>
<Name>A. Datum Corporation</Name>
<Item>
<Key>0001</Key>
</Item>
<Item>
<Key>0002</Key>
</Item>
<Item>
<Key>0003</Key>
</Item>
<Item>
<Key>0004</Key>
</Item>
</Customer>
<Customer>
<Name>Fabrikam, Inc.</Name>
<Item>
<Key>0005</Key>
</Item>
<Item>
<Key>0006</Key>
</Item>
<Item>
<Key>0007</Key>
</Item>
<Item>
<Key>0008</Key>
</Item>
</Customer>
<Customer>
<Name>Southridge Video</Name>
<Item>
<Key>0009</Key>
</Item>
<Item>
<Key>0010</Key>
</Item>
</Customer>
</Root>


The approach that this example takes is to also watch for this header information, save the header information, and then build a small XML tree that contains both the header information and the detail that you are enumerating. The axis method then yields this new, small XML tree. The query then has access to the header information as well as the detail information.

This approach has a small memory footprint. As each detail XML fragment is yielded, no references are kept to the previous fragment, and it is available for garbage collection. Note that this technique creates many short lived objects on the heap.

The following example shows how to implement and use a custom axis method that streams XML fragments from the file specified by the URI. This custom axis is specifically written such that it expects a document that has Customer, Name, and Item elements, and that those elements will be arranged as in the above Source.xml document. It is a simplistic implementation. A more robust implementation would be prepared to parse an invalid document.

Module Module1

Sub Main()
Dim xmlTree = <Root>
<%=
From el In New StreamCustomerItem("Source.xml")
Let itemKey = CInt(el.<Key>.Value)
Where itemKey >= 3 AndAlso itemKey <= 7
Select <Item>
<Customer><%= el.Parent.<Name>.Value %></Customer>
<%= el.<Key> %>
</Item>
%>
</Root>

Console.WriteLine(xmlTree)
End Sub

End Module

Public Class StreamCustomerItem
Implements IEnumerable(Of XElement)

Private _uri As String

Public Sub New(ByVal uri As String)
_uri = uri
End Sub

Public Function GetEnumerator() As IEnumerator(Of XElement) Implements IEnumerable(Of XElement).GetEnumerator
End Function

Public Function GetEnumerator1() As IEnumerator Implements IEnumerable.GetEnumerator
Return Me.GetEnumerator()
End Function
End Class

Implements IEnumerator(Of XElement)

Private _current As XElement
Private _customerName As String
Private _uri As String

Public Sub New(ByVal uri As String)
_uri = uri
End Sub

Public ReadOnly Property Current As XElement Implements IEnumerator(Of XElement).Current
Get
Return _current
End Get
End Property

Public ReadOnly Property Current1 As Object Implements IEnumerator.Current
Get
Return Me.Current
End Get
End Property

Public Function MoveNext() As Boolean Implements IEnumerator.MoveNext
Dim item As XElement
Dim name As XElement

' Parse the file, save header information when encountered, and return the
' current Item XElement.

' loop through Customer elements
Case "Customer"
' move to Name element

_customerName = If(name IsNot Nothing, name.Value, "")
Exit While
End If

End While
Case "Item"
Dim tempRoot = <Root>
<Name><%= _customerName %></Name>
<%= item %>
</Root>
_current = item
Return True
End Select
End If
End While

Return False
End Function

Public Sub Reset() Implements IEnumerator.Reset
End Sub

#Region "IDisposable Support"
Private disposedValue As Boolean ' To detect redundant calls

' IDisposable
Protected Overridable Sub Dispose(ByVal disposing As Boolean)
If Not Me.disposedValue Then
If disposing Then
End If
End If
Me.disposedValue = True
End Sub

Public Sub Dispose() Implements IDisposable.Dispose
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region

End Class


This code produces the following output:

<Root>
<Item>
<Customer>A. Datum Corporation</Customer>
<Key>0003</Key>
</Item>
<Item>
<Customer>A. Datum Corporation</Customer>
<Key>0004</Key>
</Item>
<Item>
<Customer>Fabrikam, Inc.</Customer>
<Key>0005</Key>
</Item>
<Item>
<Customer>Fabrikam, Inc.</Customer>
<Key>0006</Key>
</Item>
<Item>
<Customer>Fabrikam, Inc.</Customer>
<Key>0007</Key>
</Item>
</Root>