Merging Web Service Results


Scott Seely
Microsoft Corporation

May 21, 2002

Download Mergingxml.exe.


Lately, Matt and I have been using to talk about design points and other issues that may come up when working with Web services. Hopefully, you have been keeping up, and are now familiar with our recommendations on how to build WSDL, how to use .NET to work with it, and how to then use the Internet infrastructure to put your Web service to work.

One of our bigger themes has focused on taking advantage of industry standardized WSDL. In the previous column, Matt discussed ways that the server-side implementation of the discovery interface could cache responses to clients. A catalog, for instance, doesn't change frequently, but can sure be time intensive in terms of building a response.

This week, we want to take a look at how a client application could use standardization among suppliers to create a custom catalog to present to their customers. Each supplier's product listing will contain information items common to all suppliers, as well as some items unique to that supplier. As a business that sells pencils to the final consumer, we want to take information from the catalogs of various suppliers and assemble our own catalog, to show what we have available.

The solution presented in this column can be adapted to allow aggregation of data from several endpoints that implement the same WSDL. This could be used, for instance, to collect data from a factory floor, or to examine stock trends from NYSE, NASDAQ, and other markets. In other words, you can adapt the information here to any other situation where the Web service client needs to access identical Web services that provide different return messages and then aggregate those results.

So, how do you merge the results of several Web service calls? I can think of several ways to merge the results of the getCatalog call in the Discovery WSDL. You can place the data in a database, merge the resulting object lists into a unified list, or merge the XML.

For the online catalog, a solution that really appeals to me is to merge the XML into one document, and then query the document to come up with different views of the data. To merge the XML, simply dump the two result sets into one document and massage the data. If you stop at this point, however, you'll be left with a rather large document. To condense the information set further, you can merge similar data. For example, once you have a pencil with a manufacturer's ID, you don't need to store the details each supplier provides on that pencil.

Getting the Catalogs

The first thing the application needs to do before merging the catalogs is to get them. To keep things simple, in this example, only two catalogs are in use. These two catalogs are implemented on two endpoints accessing different databases. Again, to keep things simple, the two endpoints are hosted on the same machine and virtual directory (and included in the download). I still implement two Web services, but deployment for this installation is simplified.

To get the catalogs, the code could either issue two sequential requests or issue all the requests at once and then just wait. Given that I don't need the results in any particular order, the client application issues the requests for the catalogs one after another and then waits for the responses. This is just one way to asynchronously execute the two calls. Under the covers, the blocking calls that call out to the Web service and wait for the response happen on a separate thread while the main client thread continues to execute. To the developer, these details are hidden. The developer knows an IAsyncResult is returned, but does not have to think about what is really happening.

Private Sub GetCatalogs(ByRef cat1 As Object, _
    ByRef cat2 As Object)

    Dim svc1 As New DiscoveryBinding()
    Dim svc2 As New DiscoveryBinding()

    svc1.Url = ConfigurationSettings.AppSettings _
    svc2.Url = ConfigurationSettings.AppSettings _

    ' Execute the two items async and then synchronize 
    ' at the end.
    Dim asr1 As IAsyncResult = _
        svc1.BeginGetCatalog(Nothing, Nothing)
    Dim asr2 As IAsyncResult = _
        svc2.BeginGetCatalog(Nothing, Nothing)
    Dim wh() As WaitHandle = { _
       asr1.AsyncWaitHandle, asr2.AsyncWaitHandle}
    cat1 = svc1.EndGetCatalog(asr1)
    cat2 = svc2.EndGetCatalog(asr2)
End Sub

What happens here is that two instances of the same Web service proxy class execute against two different endpoints. Because both requests happen before any response might be received, the overall time to get the data should be reduced. Everything gets synchronized using the WaitHandle. The WaitAll call tells executing code to stop on that one line while the requests pour in. Once the catalogs are in hand, the code extracts the data from the proxy class and returns it to the caller.

If you look at the GetCatalogs function, you will notice that it takes a reference to two objects. This is because I edited the proxy. Because I didn't want to accidentally lose my changes, I created the proxy using the command line WSDL.EXE tool. In case you aren't familiar with WSDL.EXE, it does what "Add Web Reference" from the Microsoft® Visual Studio® IDE does, but it doesn't have the ability to dynamically update the proxy code at the click of a button. In short, Visual Studio will not cause you to accidentally lose any alterations to the generated code if you use WSDL.EXE. To get the proxy, I simply dropped to a command line and ran the following (all as one command):

Wsdl /l:vb /out:DiscoveryBinding.vb

This produced a file called DiscoveryBinding.vb. I added the file to the client application project and went to work. The two functions I edited in the DiscoveryBinding proxy are GetCatalog and EndGetCatalog. The edit was really simple. At the end of both functions, WSDL.EXE generated a line that converts the return value to the Catalog type. The original final line of both functions reads like this:

Return CType(results(0), Catalog)

This causes the framework to take the XML in results(0) and deserialize it into objects. When constructing the original type definition, the WSDL was designed to let the developer know what they were getting in the XML, and then to build code based on this knowledge. For many toolkits, the type information has the added advantage of allowing the toolkit to absorb the XML into datatypes native to the proxy's own language. I didn't want that feature. Still, it is helpful to know what to expect in a message. To fix the two functions so that they left the XML alone, I changed the line that coverts the XML to:

Return results(0)

I then changed the return type to Object. Why? That's the type associated with results(0) for this datatype. This return value will contain data that is covered in the next section. The advantage of this edit is that I now have access to the result elements as XML. The code also avoids the conversion from XML to some type, and then back to XML. As a side note, my first iteration of this code did exactly that. It seemed like a waste to convert things two times, and it turned out I was right.

Merging Data

Now that the catalogs have been retrieved and we have access to the original items, it's time to merge the two catalogs into one XML document that contains the union of all products offered by the vendors we work with. We are going to make the following assumptions about the data we receive:

  • All images of a pencil related to a given product ID are equivalent.
  • Properties of a pencil do not change between catalogs. That is, the pencil type, hardness, length, and width do not change because someone else is providing the pencil.
  • Product IDs are globally unique. They may not be in reality, but they are in this sample.

With these assumptions, I will merge pencils based on product ID. When a product ID matches the product ID of a pencil that has already been stored, the code will keep the pencil data with the highest price. Keeping the highest priced item allows the catalog to display a selling price that always makes sure the pencil is sold for more than it costs the "middleman." The overall controlling function is UpdateXMLCatalog. This function oversees the assembly of the data from the Web services. It orchestrates the following tasks:

  • Retrieve the catalogs.
  • Determine when the first catalog expires (this will be used to expire the content).
  • Merge the items.
  • Extract the pencil hardness and manufacturer values (this will be used in the UI later).
  • Store the results in the application cache.
  • Store the namespace manager in the application cache (saves some coding).
Private Sub UpdateXMLCatalog(ByVal itemKey As String, _
    ByVal itemValue As Object, _
    ByVal removedReason As CacheItemRemovedReason)
    ' This is a naive implementation that
    ' updates the entire catalog whenever any
    ' one item needs to be udpated. This demonstrates merging the
    ' XML more than anything else.
    Dim cat1XML As Object
    Dim cat2XML As Object
    Dim tempElement As XmlElement

    GetCatalogs(cat1XML, cat2XML)
    tempElement = cat1XML(0)
    Dim tempNav As XPath.XPathNavigator = tempElement.CreateNavigator()
    ' Figure out when to expire the items
    Dim slidingExpiration As TimeSpan = _ 
       DateTime.Parse(tempNav.Value). _
    tempNav = cat2XML(0).CreateNavigator()
    Dim tempTS As TimeSpan = DateTime.Parse(tempNav.Value). _

    If (slidingExpiration.CompareTo(tempTS) > 0) Then
        slidingExpiration = tempTS
    End If

    ' Now, copy the data into hashtables.
    Dim htPencils As New Hashtable()
    Dim htHardness As New Hashtable()
    Dim htManufacturers As New Hashtable()
    Dim xmlNT As New NameTable()
    Dim mgr As New XmlNamespaceManager(xmlNT)
    mgr.AddNamespace("pencil", _
    Me.MergeValues(htPencils, htHardness, htManufacturers, _
        cat1XML(1), mgr)
    Me.MergeValues(htPencils, htHardness, htManufacturers, _
        cat2XML(1), mgr)
    Dim theCatalog As New XmlDocument()
    Dim childNode As XmlNode
    Dim nodeCopy As XmlNode
    Dim catNode As XmlNode = theCatalog.CreateNode( _
       XmlNodeType.Element, "Catalog", "")
    For Each childNode In htPencils.Values
        nodeCopy = theCatalog.ImportNode(childNode, True)

    ' Save the changes
    Dim onRemove As CacheItemRemovedCallback = _
       New CacheItemRemovedCallback(AddressOf UpdateXMLCatalog)
    UpdateHardness(htHardness, slidingExpiration, onRemove)
    UpdateManufacturer(htManufacturers, slidingExpiration, onRemove)
    Context.Cache.Insert("Catalog", theCatalog, Nothing, _
       Context.Cache.NoAbsoluteExpiration, slidingExpiration, _
       CacheItemPriority.Normal, _
    Context.Cache.Insert("NSMgr", mgr, Nothing, _
       Context.Cache.NoAbsoluteExpiration, slidingExpiration, _
       CacheItemPriority.Normal, _
End Sub

All of the code stores the results in the application cache. The cache itself gets populated with data whenever an item in the cache is removed. This was done by initially inserting all of the data into the cache when the application starts up. As soon as any of the content is about to be removed from the cache, the entire cache gets updated.

The final interesting bit of code in the merge is the merge itself. The function, MergeValues, takes the XmlElement, which contains all the Pencil data, and extracts the information we are looking for. The search happens using the PencilSellers namespace prefix for types, The code uses a Hashtable to find any duplicate data. In the end, this guarantees that we only have unique product IDs for the highest priced versions of the pencils.

Private Sub MergeValues(ByRef htPencils As Hashtable, _
    ByRef htHardness As Hashtable, _
    ByRef htManufacturer As Hashtable, _
    ByRef pencilItems As XmlElement, _
    ByRef mgr As XmlNamespaceManager)

    Dim productID As String
    Dim price As Single
    Dim storedValue As Single
    Dim manufacturer As String
    Dim hardness As Long
    Dim elem As XmlNode

    ' Iterate over the nodes
    For Each elem In pencilItems.ChildNodes
        ' Extract the data we are interested in
        productID = elem.SelectSingleNode( _
           "pencil:productID", mgr).FirstChild.Value
        price = Single.Parse(elem.SelectSingleNode( _
           "pencil:price", mgr).FirstChild.Value)
        manufacturer = elem.SelectSingleNode( _
           "pencil:manufacturer", mgr).FirstChild.Value
        hardness = Long.Parse(elem.SelectSingleNode( _
           "pencil:hardness", mgr).FirstChild.Value)

        ' Check to see if the item has been cached yet.
        If htPencils.ContainsKey(productID) Then
            ' We have the item stored, get the price.
            storedValue = Single.Parse( _
              htPencils.Item(productID).SelectSingleNode( _
                  "pencil:price", mgr).FirstChild.Value)

            ' If the current node is more expensive
            ' than the one stored, replace the
            ' pencil that we have stored.
            If (storedValue < price) Then
                htPencils.Item(productID) = elem
            End If
            ' Add the pencil to the cache.
            htPencils.Add(productID, elem)
        End If

        ' Add the manufacturer and hardness to the lists
        ' if we haven't seen the values yet.
        If Not (htManufacturer.ContainsKey(manufacturer)) Then
            htManufacturer.Add(manufacturer, 0)
        End If
        If Not (htHardness.ContainsKey(hardness)) Then
            htHardness.Add(hardness, 0)
        End If
End Sub

The last few lines of UpdateXMLCatalog take the merged values and place those items into an XmlDocument. The code shown up to this point could have been put together by creating an array of Pencil items. The downside to not having the XML is that I cannot query the items based on search terms, and would have to walk through all the elements to figure things out. Because the XML can be queried, it will be helpful in displaying the data.

Displaying the Catalog

Now, we get to the real motivation behind merging the items as XML instead of keeping them as objects: I want to be able to search the catalog data without having to write even one line of the code that performs the search. Keeping the data as XML allows me to leverage the XPath query support built into the .NET Framework.

The catalog display is very simple and allows the user to filter based on manufacturer and hardness. When the user clicks a button to update the view, the catalog is refreshed. Figure 1 shows the application at startup.

PencilClient user interface

Figure 1. The PencilClient user interface

All users need to do is to select the combination of items they want to filter on, and the search happens on the local machine. Yes, this could also happen by using the query interface to all catalog suppliers and merging the data on my end, but our approach seems simpler for a task that will be executed a lot from the catalog site. Since the data can be cached, it makes sense to do the searching our own way. The search itself happens in btnQuery_Click.

Private Sub btnQuery_Click(ByVal sender As System.Object, _
    ByVal e As System.EventArgs) Handles btnQuery.Click

    Dim xmlDoc As XmlDocument = Cache("Catalog")
    Dim childNode As XmlNode
    Dim selHardness As Long = -1
    Dim selManufacturer As String = _
    Dim bSearchHardness As Boolean = False
    Dim bSearchManufacturer As Boolean = False
    Dim mgr As XmlNamespaceManager = Cache("NSMgr")
    Dim xpathQS As String = "/Catalog/pencil:Pencil"

    ' Determine which things we need to search on.
    If (ddlHardness.SelectedItem.Text <> ANY_ITEM) Then
        selHardness = Long.Parse(ddlHardness.SelectedItem.Text)
        bSearchHardness = True
    End If
    If (selManufacturer <> ANY_ITEM) Then
        bSearchManufacturer = True
    End If

    ' Setup the query string. Note that the string
    ' is already setup to handle the "get everything"
    ' case.
    If (bSearchManufacturer And bSearchHardness) Then
        xpathQS = xpathQS & _
           "[pencil:manufacturer/text() " & _
           " = '" & selManufacturer & _
           "' and pencil:hardness/text() " & _
           " = '" & selHardness & "']"
    ElseIf (bSearchManufacturer) Then
        xpathQS = xpathQS & _
           "[pencil:manufacturer/text() " & _
           " = '" & selManufacturer & "']"
    ElseIf (bSearchHardness) Then
        xpathQS = xpathQS & _
           "[pencil:hardness/text() " & _
           " = '" & selHardness & "']"
    End If

    ' Get the list of nodes that satisfies the search
    ' criteria and display everything.
    Dim nodeList As XmlNodeList = _
       xmlDoc.SelectNodes(xpathQS, mgr)
    For Each childNode In nodeList
End Sub

Once the nodes are found, a simple method grabs the data from the nodes and places it into the table. The ability to query the document served as the primary motivation for keeping everything as XML. A larger application that used that catalog more heavily would continue to reap the benefits of using XML instead of plain old boring objects. The query itself could be created, because I trust that the WSDL file accurately represents the format of the data coming in the SOAP messages. Because I know the format, I can create queries that extract data from the XML.


Strongly typed XML is a good thing. As a part of a WSDL document, it defines the contract against which message senders and receivers implement their code. As a receiver of this information, you do not always want to or need to read the response into proxy objects. Sometimes, you just want the XML so that you can search the data. This column showed one reason why: The client application gets catalog data from many sources and needs to display a catalog of the goods available for sale through this seller site. The data is stored as XML instead of objects in order to make searching as easy as is possible. The client uses XPath expressions to perform these searches.


At Your Service

Scott Seely is a member of the MSDN Architectural Samples team. Besides his work there, Scott is the author of SOAP: Cross Platform Web Service Development Using XML (Prentice Hall—PTR) and the lead author for Creating and Consuming Web Services in Visual Basic (Addison-Wesley).