JumpStart for Creating a SAX2 Application with C++

 

Microsoft Corporation

Updated November 2000

Download Sax2jumpstart.exe.

Summary: This article provides an introduction to how the Microsoft XML Parser (MSXML) implements the Simple API for XML (SAX2), and helps you get started building applications using the SAX2 interfaces. It also shows you how to quickly build a basic C++ application that reads an XML document and prints the document's tags to the console. (8 printed pages)

Note To run this tutorial, you need the latest production version of MSXML, available from the MSDN's XML/XSL Home Page.

Contents

Overview of the JumpStart Application (C++) Implementing the ContentHandler Creating the Main Program Complete Code for the Main Program

Overview of the JumpStart Application (C++)

SAX2 is a push-model parser. When SAX2 parses a document, SAXXMLReader (the SAX2 reader) reads the document and passes a series of events to the event handlers that you choose to implement. The SAX2 reader generates several categories of events, including:

  • Those that occur in the content of an XML document.
  • Those that occur in the document type definition (DTD).
  • Those that occur as errors.

To handle such events, you implement a corresponding handler class that contains methods to process the appropriate events. You only have to implement handlers for those events you wish to process. If you do not implement a handler for a specific type of event, the reader simply ignores that event.

The JumpStart application contains two main components that you must implement.

Component Description
ContentHandler Implements the ISAXContentHandler interface. The ContentHandler is a class that provides methods for processing the main content of an XML document. When SAX2 parses a document, the reader passes a series of events to the ContentHandler. For example, for each element in a document, the reader passes startElement, Characters, and endElement events. To handle these events, you add code to the methods in the ContentHandler to process the information passed by the reader.
Main program
  1. Creates an instance of the SAX2 reader (SAXXMLReader).
  2. Creates an instance of the ContentHandler.
  3. Sets the ContentHandler to the reader.
  4. Sets the source for the XML document.
  5. Starts the parsing process.

To create the JumpStart application, you must first implement a handler class extending the ISAXContentHandler interface. With C++, you can create a class and not a full-featured COM object. In the methods of this class, you tell the application what to do when it receives notification of an event. After you implement the ContentHandler class, you create the main program that creates an instance of SAXXMLReader, sets the ContentHandler, and starts the parse operation.

Implementing the ContentHandler

The first step when creating a SAX2 application is to implement the handler classes. This example implements only the ContentHandler that is derived from the ISAXContentHandler interface.

Creating the Header File

When it comes to using SAX2, the most useful handler class is the ContentHandler. You derive this class from the ISAXContentHandler interface.

To use the SAX2 interfaces that come with MSXML, you must declare them using the following code.

#import <msxml3.dll> raw_interfaces_only
using namespace MSXML2;

Note For the JumpStart application, you declare the interfaces in the StdAfx.h file.

To implement the ContentHandler, create the header file, named "MyContent.h", as shown in the following sample code.

#include "SAXContentHandlerImpl.h"

class MyContent : public SAXContentHandlerImpl
{
public:
    MyContent();
    virtual ~MyContent();

        virtual HRESULT STDMETHODCALLTYPE startElement(
            /* [in] */ wchar_t __RPC_FAR *pwchNamespaceUri,
            /* [in] */ int cchNamespaceUri,
            /* [in] */ wchar_t __RPC_FAR *pwchLocalName,
            /* [in] */ int cchLocalName,
            /* [in] */ wchar_t __RPC_FAR *pwchQName,
            /* [in] */ int cchQName,
            /* [in] */ ISAXAttributes __RPC_FAR *pAttributes);

        virtual HRESULT STDMETHODCALLTYPE endElement(
            /* [in] */ wchar_t __RPC_FAR *pwchNamespaceUri,
            /* [in] */ int cchNamespaceUri,
            /* [in] */ wchar_t __RPC_FAR *pwchLocalName,
            /* [in] */ int cchLocalName,
            /* [in] */ wchar_t __RPC_FAR *pwchQName,
            /* [in] */ int cchQName);

        virtual HRESULT STDMETHODCALLTYPE startDocument();

private:
        void prt(
            /* [in] */ const wchar_t * pwchFmt,
            /* [in] */ const wchar_t __RPC_FAR *pwchVal,
            /* [in] */ int cchVal);
        int idnt;
};

#endif // !defined(AFX_MYCONTENT_H__E1B3AF99_0FA6_44CD_82E3_55719F9E3806__INCLUDED_)

Creating the MyContent Class

After you create the MyContent.h file, the next step is to implement the ContentHandler class, named MyContent.

#include "stdafx.h"
#include "MyContent.h"

//////////////////////////////////////////////////////////////////////
// Construction/Destruction
//////////////////////////////////////////////////////////////////////

MyContent::MyContent()
{
    idnt = 0;
}

MyContent::~MyContent()
{

}

HRESULT STDMETHODCALLTYPE MyContent::startElement(
            /* [in] */ wchar_t __RPC_FAR *pwchNamespaceUri,
            /* [in] */ int cchNamespaceUri,
            /* [in] */ wchar_t __RPC_FAR *pwchLocalName,
            /* [in] */ int cchLocalName,
            /* [in] */ wchar_t __RPC_FAR *pwchQName,
            /* [in] */ int cchQName,
            /* [in] */ ISAXAttributes __RPC_FAR *pAttributes)
{
    HRESULT hr = S_OK;
    int l;
    printf("\n%*s",3 * idnt++, "");
    prt(L"<%s",pwchLocalName,cchLocalName);
    pAttributes->getLength(&l);
    for ( int i=0; i<l; i++ ) {
        wchar_t * ln, * vl; int lnl, vll;
        pAttributes->getLocalName(i,&ln,&lnl);
        prt(L" %s=", ln, lnl);
        pAttributes->getValue(i,&vl,&vll);
        prt(L"\"%s\"", vl, vll);
    }
    printf(">");

    // A little example, how to abort parse
    if ( wcsncmp(pwchLocalName,L"qu",2) == 0 ) {
        printf("\n<qu> tag encountered, parsing aborted.");
        hr = E_FAIL;
    }

    return hr;
}


HRESULT STDMETHODCALLTYPE MyContent::endElement(
            /* [in] */ wchar_t __RPC_FAR *pwchNamespaceUri,
            /* [in] */ int cchNamespaceUri,
            /* [in] */ wchar_t __RPC_FAR *pwchLocalName,
            /* [in] */ int cchLocalName,
            /* [in] */ wchar_t __RPC_FAR *pwchQName,
            /* [in] */ int cchQName)
{
    printf("\n%*s",3 * --idnt, "");
    prt(L"</%s>",pwchLocalName,cchLocalName);
    return S_OK;
}

HRESULT STDMETHODCALLTYPE MyContent::startDocument()
{
    printf("<?xml version=\"1.0\" ?>");
    return S_OK;
}

void MyContent::prt(
            /* [in] */ const wchar_t * pwchFmt,
            /* [in] */ const wchar_t __RPC_FAR *pwchVal,
            /* [in] */ int cchVal)
{
    static wchar_t val[1000];
    cchVal = cchVal>999 ? 999 : cchVal;
    wcsncpy( val, pwchVal, cchVal ); val[cchVal] = 0;
    wprintf(pwchFmt,val);
}

Creating the Main Program

Finally, you create a main program that does the following:

  • Provides a command prompt interface.
  • Creates a parser by instantiating a class that implements the ISAXXMLReader interface.
  • Creates a ContentHandler by instantiating the MyContent class.
  • Registers the ContentHandler with the parser.

Complete Code for the Main Program

Here's the code for the main program:

#include "stdafx.h"

#include "MyContent.h"
#include "SAXErrorHandlerImpl.h"

int main(int argc, char* argv[])
{
    if (argc<2) {
        printf("\nTry something like\n\ttestSax _
          file:///drive:/path/file.xml\nGood luck!\n");
        return 0;    // Need URL to read
    }

    CoInitialize(NULL);
    ISAXXMLReader* pRdr = NULL;

    HRESULT hr = CoCreateInstance(
                                __uuidof(SAXXMLReader),
                                NULL,
                                CLSCTX_ALL,
                                __uuidof(ISAXXMLReader),
                                (void **)&pRdr);

    if(!FAILED(hr))
    {
        MyContent * pMc = new MyContent();
        hr = pRdr->putContentHandler(pMc);

        // No sense to do so in this example, just an illustration how to _
           set other handlers
        //=========================================================================
         SAXErrorHandlerImpl * pEc = new SAXErrorHandlerImpl();
         hr = pRdr->putErrorHandler(pEc);
        // SAXDTDHandlerImpl * pDc = new SAXDTDHandlerImpl();
        // hr = pRdr->putDTDHandler(pDc);


        static wchar_t URL[1000];
        mbstowcs( URL, argv[1], 999 );
        wprintf(L"\nParsing document: %s\n", URL);

        hr = pRdr->parseURL(URL);
        printf("\nParse result code: %08x\n\n",hr);

        pRdr->Release();
    }
    else
    {
        printf("\nError %08X\n\n", hr);
    }

    CoUninitialize();
    return 0;
}

Running the JumpStart Application

From the command prompt, type:

pathname\debug\cppsaxsample.exe test.xml.

In the code above, pathname is the name of the folder to which you downloaded the JumpStart application files. The test.xml file is a test file provided with the download. The parsed test.xml file should appear in the command prompt window.