(Deprecated) IXMLParser Interface


[Deprecated. Do not use. Superseded by SAX2 API/MSXML 3.0]

The parser takes XML input in a variety of ways (for example, through a stream, a URL to a document, or text pushed to it), parses the XML, and sends parse events to a NodeFactory. The parser is divided into two interfaces. An IXMLNodeSource interface defines the parse events and other information, such as position information for parse errors that are sent to a NodeFactory. The IXMLParser interface inherits from IXMLNodeSource, and adds methods to define the XML source (stream, URL, or pushed text), set NodeFactories, and deal with security and state reporting issues.

interface IXMLParser : IXMLNodeSource  
        [in] const WCHAR* pszBaseUrl,  
        [in] const WCHAR* pszRelativeUrl,  
        [in] BOOL fAsync);  
    HRESULT Load(   
        [in] BOOL fFullyAvailable,  
        [in] IMoniker *pimkName,  
        [in] LPBC pibc,  
        [in] DWORD grfMode);  
    HRESULT SetInput(  
        [in] IUnknown *pStm);  
    HRESULT PushData(  
        [in] const char* pData,   
        [in] ULONG ulChars,  
        [in] BOOL fLastBuffer);  
    HRESULT LoadDTD(  
        [in] const WCHAR* pszBaseUrl,  
        [in] const WCHAR* pszRelativeUrl);  
    HRESULT LoadEntity(  
        [in] const WCHAR* pszBaseUrl,  
        [in] const WCHAR* pszRelativeUrl,  
        [in] BOOL fpe);  
    HRESULT ParseEntity(  
        [in] const WCHAR* pwcText,   
        [in] ULONG ulLen,  
        [in] BOOL fpe);  
    HRESULT ExpandEntity(  
        [in] const WCHAR* pwcText,   
        [in] ULONG ulLen);  
    HRESULT SetRoot(  
        [in] PVOID pRoot);  
    HRESULT GetRoot(  
        [in] PVOID* ppRoot);  
    HRESULT Run(  
        [in] long lChars);  
    HRESULT GetParserState();  
    HRESULT Suspend();  
    HRESULT Reset();  
    HRESULT SetFlags(  
        [in] ULONG iFlags);  
    HRESULT SetSecureBaseURL(  
        [in] const WCHAR* pszBaseUrl);  
    HRESULT GetSecureBaseURL(  
        [out] const WCHAR** ppwcBuf);  

The parser is multithread safe, meaning it is safe to call these methods from any thread. They are protected by a critical section.


The SetURL method is one of four different methods for providing input to the parser. The other methods are Load, SetInput, and PushData. You pass a base URL, and a relative URL. For example, if the base URL is http://www.microsoft.com/xml/test.htm and the relative URL is test.xml, the resulting file that will be loaded is http://www.microsoft.com/xml/test.htm.


This example shows how this method is useful when you have the base URL of an HTML page and you want to fetch an XML document relative to that without doing with any string manipulation. "test.htm" is arbitrary; it has nothing to do with the function of the XML parser.

The pszBaseURL is optional and is only there so that you do not have to resolve relative URLs yourself. Passing a full URL for the pszRelativeURL overrides the pszBaseURL, and the pszBaseURL can be NULL. You cannot pass a NULL for the pszRelativeURL. You can pass a directory as the base as long as there is a trailing "/", for example, http://www.microsoft.com/xml/.

File URLs have some additional behavior. The base URL file://central/, where central is the name of a server, is not valid. You must also specify the name of the "share" on that server, for example: file://central/public/. Relative file URLs without a base URL load the file from the "current directory" for that process.

The async flag controls whether an HTTP request is handled synchronously or asynchronously. If it is async, you will get E_PENDING from Run and you must pump your message queue. When data becomes available, the parser will be notified by URLMON (through the message queue) and will parse the new data and your NodeFactory will continue to be called.


The Load method corresponds to the load method in IPersistMoniker. The parser will call BindToStorage to get an IStream and load the XML associated with the given moniker. You can also call GetURL to get the URL representation of the given moniker. This method will not work on a moniker created by CreateFileMoniker. Use CreateURLMoniker instead.


Use the SetInput method if you already have an IStream containing XML. If the stream returns E_PENDING, then the parser returns E_PENDING from Run(). The caller then must call Run again when more data is available.


The lowest level way to provide input is as a raw buffer of bytes. The buffer is not NULL terminated. You call the PushData method with each buffer. Run() will continue to return E_PENDING until you call PushData with the lastBuffer argument set to TRUE. An XML token can span multiple buffers. For example, the first buffer might end with "<!-- the quick brown fox", in which case the parser will return E_PENDING; if the next buffer starts with "jumped over the lazy dog -->", the parser will complete the token and call the NodeFactory with the combined COMMENT text "<!---the quick brown fox jumped over the lazy dog -->".

When you call PushData with lastBuffer set to TRUE, you should not push any more data unless you first reset the parser. If you do, the behavior is undefined.


After you have started parsing XML, you may want to fetch an external document type definition (DTD). For example, if the XML document contains <!DOCTYPE root SYSTEM "somedtd.dtd">, you can load that DTD by calling LoadDTD on the parser. Your factory will then be called to handle all the DTD nodes, just as it does for a DTD internal subset.


DTDs can define external entities <!ENTITY bar SYSTEM "ent.xml">. The LoadEntity method is used to go fetch those entities. The fpe argument specified that this is a parameter entity.


The ParseEntity method is used to parse the value of an internal entity reference <!ENTITY bar "<x>internal</x>">. CreateNode calls will be made for the contents of the entity.


The ExpandEntity method is used to expand an entity reference. For example, when a parameter entity is used in a DTD, %myEntity; this method can be called to provide the entity replacement text that the parser should insert into the input stream before continuing to parse.

SetRoot, GetRoot

The SetRoot method provides the pNodeParent argument for all root level CreateNode calls. Because you must manage any reference counting that may be required on this object, you may have to use the GetRoot method after parsing.


The Run method parses the specified amount of XML (in characters); it returns E_PENDING if it is not finished or S_OK when it reaches the end of the input. An amount of -1 means parse as much as it can get from the input stream. If you are using the PushData method, the Run method returns E_PENDING until you push the last buffer; after that, it will return S_OK. The number given to Run is just a hint. Run will still succeed even if the specified number of characters are not left in the input buffer.

If you started an asynchronous download, Run(-1) will return E_PENDING; as long as your application has a message loop, the download and parsing will occur in the background as data becomes available from URLMON. You can check GetParserState periodically; at some point the parser will no longer return XMLPARSER_BUSY. If you want to monitor the progress, you can use the Load method, register your own IbindStatusCallback, and watch the OnProgress calls.


GetParserState returns one of the following values indicating the state of the parser.

typedef enum  
    XMLPARSER_IDLE,  // The parser is in the Reset state  
    XMLPARSER_WAITING,  // The input stream returned E_PENDING  
    XMLPARSER_BUSY,  // There is data available for parsing  
    XMLPARSER_ERROR,  // The parser found an error  
    XMLPARSER_STOPPED,  // Abort was called  
    XMLPARSER_SUSPENDED,  // Suspend was called  

If the parser found an error, this takes precedence over subsequent calls to Abort() or Suspend(); you will still get XMLPARSER_ERROR in this case. Also, if you aborted the parser, this takes precedence over Suspend(); you will still get XMLPARSER_STOPPED even though the parser is also suspended.


The parser can be suspended at any time; to resume parsing, call Run again. This helps clients tweak performance with just-in-time parsing.


The Reset method puts the parser back into the initial state so you can load another XML file. Reset also resets the root object, the root NodeFactory, and all other Node Factories that may be in the current parser context.


SetFlags is used for setting the flags defined in IXMLNodeSource::GetFlags.


The secure base URL is used to stop cross-domain data access. After the secure base URL is set, all XML loads must come from the same domain (unless the user has explicitly enabled cross-domain data access). For example, the following will return an E_ACCESSDENIED error.

pParser->SetURL(NULL, L"http://www.microsoft.com",FALSE);