Article
11/23/2011

Developing Windows Applications in C++: Working with COM-Using COM Objects

As mentioned in Chapter 2, COM is a binary standard for application-to-application integration. The applications don’t have to be written in the same language, compiled with the same tools, or even running on the same computer. (Distributed COM, or DCOM, enables cross-machine COM calls but is out of scope for this material.) COM is almost 20 years old, and some significant parts of Windows functionality are offered through COM only. That means that even if you wouldn’t choose COM for your own application-to-application integration technique, you may need to learn how to use it so that you can access some Windows functionality.

You saw in Chapter 4 that using functionality from Windows, like controls or message boxes, can save you a tremendous amount of code and effort. Some Windows functionality is available by just calling a function, one of the Windows API functions, like CreateWindow or MessageBox. But for some, you have to make a COM call instead.

COM Concepts

Interfaces

As a binary standard, COM includes capabilities for discovering what code in a COM component you can invoke, as well as actually invoking that code for your application-to-application integration. The key to understanding what a given COM component can do is knowing what interfaces it implements. A COM interface, like those used in other languages or platforms, describes a set of functions and their signatures. Writing one, and putting the definition into a form that COM and other applications can use, shouldn’t be the first COM thing your learn how to do, so for now it’s enough to know that there are interfaces (collections of functions) and that your code talks to a COM object through one or more interfaces. It’s quite common that a COM component might implement multiple interfaces, and that several different COM components might implement the same interface.

Developing Windows Applications in C++

Identifiers

COM components are typically distributed in DLLs or EXEs. The COM subsystem knows how to find these because of entries in the Registry. When you want to interact with a specific COM component, you identify it using a GUID, Globally Unique IDentifier. Typically a single component, also known as a COM Class or a coclass, can have several different GUIDs associated with it. These have names like CLSID, ProgId, and so on. The documentation for the component you want to use will tell you the details.

Sidebar: Two important Interfaces

You can call your interfaces whatever you like (everyone likes to start their interface names with I). There are two particular interfaces that are very important in the COM world: IUnknown and IDispatch.

IUnknown

IUnknown is like the “universal base class” in some inheritance hierarchies – it’s the interface that every component must implement. It has two purposes: to manage COM lifetimes through reference counting, and to help you find other interfaces. To that end, it contains three functions:

AddRef – tells the component that someone out there (your code) is still using the component, thus keeping it from being cleaned up. Implemented internally by incrementing a reference count.
Release – tells the component that code which previously called to AddRef is now done with the component. Implemented internally by decrementing a reference count, and cleaning up if the count reaches zero.
QueryInterface – returns an interface pointer to some other interface that the component implements, or an error if an invalid interface is requested.

Once you have an IUnknown interface pointer for a particular component, you can use it to get a pointer to any other interface that the component implements. These other interfaces are the ones that do the actual work of the component. That still raises the issue of how to get an IUnknown method, and how to say what component you want to get hold of. Luckily, for Windows functionality that is implemented as COM, there are API calls you can make that will do a lot of this work for you.

IDispatch

The IDispatch method is worth mentioning because it is an interface many (but not all) components implement in order to support very late binding for scripting languages. When you use IDispatch, your code can invoke a method of a COM component without knowing the name or signature of the method at compile time. This is especially relevant when you’re compiling an application that will run scripts or macros. The scripts or macros know the names of the COM functions they want to invoke, though your hosting application does not. Through IDispatch (also known as OLE Automation) methods can be invoked through their dispatch IDs (DISPIDs) rather than a traditional function call.

Since the samples in this material are written in C++, rather than a scripting language hosted in another application, they don’t need to rely on IDispatch, and can use whatever interfaces a particular COM component offers.

Calling a COM Object from your code

To demonstrate calling Windows functionality that is offered by COM, this chapter will use the Text to Speech API. You will need speakers or headphones on your development machine to confirm that it is working correctly.

Starting Point

To follow along in the code examples in this chapter, start by creating a Windows application similar to the one in Chapter 4, Typical Windows tasks, without the Direct2D drawing code. The App class looks like this:

#pragma once #define MAX_LOADSTRING 100 #include "resource.h" class App { public: App(void); ~App(void); bool Init(HINSTANCE instance, int cmd); int RunMessageLoop(); ATOM RegisterClass(); BOOL InitInstance(int); static LRESULT CALLBACK WndProc(HWND, UINT, WPARAM, LPARAM); static INT_PTR CALLBACK About(HWND, UINT, WPARAM, LPARAM); HINSTANCE getInstance() {return hInstance;} private: HINSTANCE hInstance; // current instance TCHAR szTitle[MAX_LOADSTRING]; // The title bar text TCHAR szWindowClass[MAX_LOADSTRING]; // the main window class name };

The implementations of these methods can be copied from the Second app in Chapter 4, Typical Windows Tasks, removing the Direct2D code. The WndProc is much smaller than in Second, and looks like this:

LRESULT CALLBACK App::WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam) { App* pApp; if (message == WM_CREATE) { LPCREATESTRUCT pcs = (LPCREATESTRUCT)lParam; pApp = (App *)pcs->lpCreateParams; ::SetWindowLongPtrW(hWnd,GWLP_USERDATA,PtrToUlong(pApp)); return TRUE; } else { pApp = reinterpret_cast<App *>(static_cast<LONG_PTR>(::GetWindowLongPtrW(hWnd,GWLP_USERDATA))); if (!pApp) return DefWindowProc(hWnd, message, wParam, lParam); } int wmId, wmEvent; PAINTSTRUCT ps; HDC hdc; switch (message) { case WM_COMMAND: wmId = LOWORD(wParam); wmEvent = HIWORD(wParam); // Parse the menu selections: switch (wmId) { case IDM_ABOUT: DialogBox(pApp->getInstance(), MAKEINTRESOURCE(IDD_ABOUTBOX), hWnd, pApp->About); break; case IDM_EXIT: DestroyWindow(hWnd); break; default: return DefWindowProc(hWnd, message, wParam, lParam); } break; case WM_PAINT: hdc = BeginPaint(hWnd, &ps); // EndPaint(hWnd, &ps); break; case WM_DESTROY: PostQuitMessage(0); break; default: return DefWindowProc(hWnd, message, wParam, lParam); } return 0; }

Adding a button

Right-click IDM_ABOUT or IDM_EXIT and choose Go To Definition to open resource.h, scrolled to the identifiers. Add a line to identify a new button:

#define IDB_SPEAK 110

Add these lines to the case statement for WM_CREATE, to create a button in your user interface:

CreateWindow(L"button", L"Speak", WS_VISIBLE | WS_CHILD , 20, 50, 80, 25, hWnd, (HMENU) IDB_SPEAK, NULL, NULL);

Finally, add a case statement inside the WM_COMMAND case statement that will process commands for IDB_SPEAK:

case IDB_SPEAK: { //nothing yet } break;

The basic structure is now in place. Build and test the application if you wish.

Initializing COM

Before you can get hold of a COM object and call its methods, you must initialize COM. COM will also need to be cleaned up on the way out of your application. The safest place to pair these calls is in your main function, outside the App class. The main will now look like this:

int APIENTRY _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPTSTR lpCmdLine, int nCmdShow) { UNREFERENCED_PARAMETER(hPrevInstance); UNREFERENCED_PARAMETER(lpCmdLine); App theApp; if (theApp.Init(hInstance,nCmdShow)) { HRESULT hr = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE); int QuitParam = theApp.RunMessageLoop(); CoUninitialize(); return QuitParam; } return 0; }

The call to CoInitializeEx takes two parameters. The first must always be NULL. The second is a flag or flags controlling some important aspects of your application’s interaction with COM. It’s always a good idea to include COINIT_DISABLE_OLE1DDE, because it disables some unneeded OLE capability and eliminates the associated overhead. You should combine COINIT_DISABLE_OLE1DDE with a flag to set your application’s threading model: COINIT_APARTMENTTHREADED for single threading, and COINIT_MULTITHREADED, the default, for multi-threaded. A process can use many different COM components, and may have many threads. If a thread calls CoInitializeEx requesting a single threaded apartment, an apartment will be created for the thread, and only that thread can interact with COM objects in the apartment. A process could have many different apartments for single-threaded COM objects. If a thread calls CoInitializeEx requesting a multi-threaded apartment, it gains access to the one multithreaded apartment for the entire process, which will be shared by all threads that initialized COM in this way. An apartment is just a container for COM components and a way of organizing thread information. You never interact directly with an apartment, but you need to know what apartment the COM component is in so that you know whether methods might be called from multiple threads or not.

As you can probably see, the single-threaded apartment model makes for simpler programming: you don’t need to handle concurrency or concern yourself with communication between objects that are on different threads. You should, however, avoid the use of global variables, or protect them with a thread synchronization method such as a critical section. Using the multi-threaded apartment, sometimes called free threading, requires you to handle thread synchronization issues between your COM objects. You get more control, and possibly better performance, at the cost of programming complexity.

Since the samples in this chapter are not explicitly threaded, there’s no need to take on the complexity of free threading. That’s why the demo app uses a single threaded apartment model. For more on COM’s threading models, see the MSDN article Understanding and Using COM Threading Models.

Creating and using a COM component

To use a COM component, you need to ask Windows to help you find the class, and also the particular interface you want to use. They both have IDs – a CLSID for the class and an IID, or interface ID, for the interface. To Windows, these are GUIDs, but typing a GUID in your code is not particularly convenient or productive. Most COM components, and all the ones that provide Windows functionality, are accompanied by a C++ header file that defines macros and constants to help you use the component easily.

Windows offers text to speech capabilities through the Speech API, or SAPI. SAPI actually covers a large variety of speech-related tasks. The ISpVoice interface covers the text to speech functionality. In the documentation for SAPI and the ISpVoice interface you can see three important pieces of information:

The CLSID is CLSID_SpVoice
The method to actually speak some text is Speak()
The header file is sapi.h

In App.cpp, add a line before the constructor to include the SAPI header:

#include <sapi.h>

The case statement for IDB_SPEAK can now get hold of the COM component and call its Speak() method, passing in some hardcoded text. Here’s how that looks:

case IDB_SPEAK: { ISpVoice* pVoice = NULL; HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, reinterpret_cast<void**>(&pVoice)); if( SUCCEEDED( hr ) ) { hr = pVoice->Speak(L"Hello from COM.", 0, NULL); pVoice->Release(); pVoice = NULL; } } break;

This block of code is a good template for code you will write to use any Windows functionality that is offered through COM. It uses a type defined in sapi.h, ISpVoice, to get hold of the required interface. The call to CoCreateInstance initializes this “interface pointer”, and once it is obtained, making COM calls looks just like making an ordinary C++ member function call.

CoCreateInstance is the function that actually creates an instance of the COM object (the Co at the start of the name stands for COM object) and it takes five parameters:

The CLSID. You would get this from the documentation. It identifies the class you want to use.
Another COM object that will contain this one using COM aggregation. NULL means you are not using aggregation in this case.
An execution context, identified by an enum.
The IID: interface ID.
An “interface pointer” that it will set. Because CoCreateInstance can create any kind of interface pointer, pVoice must be cast to a void**.

To understand the execution context, you must know a little about how applications work under Windows. Your application has at least one process. The COM component can be in the same process, or a different one. It can even be on a different machine. The fastest execution happens when the component is in the same process (often called in-process.) The COM component and your application will share memory, which makes things faster but perhaps riskier also. COM components distributed as DLLs are usually implemented as in-process components. Typically, an EXE is an out-of-process COM component. It’s also possible that you could use DCOM to access a COM component on a different computer, a remote context. Each of these preferences can be specified with a particular CLSCTX enum value. Not all COM components can be created in all contexts – the component itself controls how it will be created. The special constant CLSCTX_ALL means “in process, unless the component doesn’t support that, in which case out-of-process, unless the component doesn’t support that, in which case remote.” It’s a safe choice when you aren’t sure what context to specify. If the documentation doesn’t mention a context to use, use CLSCTX_ALL.

Sometimes, documentation is incomplete. For example, the documentation for ISpVoice lists the CLSID but not the IID. If you have the header file, you can find what you want easily. First, get Visual Studio to open the header file for you – right click the file name in the include statement and choose Open Document:

This saves you from having to know where the header file is located on your hard drive. Next, search for the string IID. You can use the Edit, Find menu, but it’s often more convenient to use the Find box on the toolbar:

Type the search string and press Enter. Press F3 to find the next instance, and the next. After finding a few, you can see that the IID constants start IID_ - use that as the search term to find the SpVoice one more quickly. After a few hits, you are likely to guess that the constant you need is IID_ISpVoice – search for that to confirm and you’ll find this line:

EXTERN_C const IID IID_ISpVoice;

This is the constant used for the fourth parameter to CoCreateInstance. If you did not know the CLSID, you could use the same approach to find it, given the header file.

What if you couldn’t find the header file? If you’re looking for a COM component that provides Windows functionality, you can expect to find the header under your SDK installation. If you pause your mouse over the file tab when you have sapi.h open, you’ll see the full path to the file:

You can right click the tab to either copy the path to the file, or to open the folder in which this file is located. Combine that with a Windows 7 search, and you can find any other header file you might need, given the COM interface name.

Notice that this code checks the HRESULT from CoCreateInstance. This is very important. If you ask a class for an interface it doesn’t support, or a context it doesn’t support, the interface pointer will not be usable. In a real application, you would need to cope with this problem somehow, but the demo code just does nothing if it can’t get the interface pointer successfully.

Once you are sure that pVoice is a valid interface pointer to an ISpVoice interface implemented as part of Windows, you can just use its methods. The Speak() method has some very good defaults, so if you just pass in a hardcoded string, it will speak it aloud using the default voice on your computer. Having used the interface, you should remember to release it, and then set it to NULL as an indicator that it is no longer a valid interface pointer. In the demo code, it goes out of scope immediately, so setting it to NULL isn’t strictly required, but it’s a good habit.

Try building and running this code. When you click the button (remember to have your speakers unmuted), you should hear the sentence.

Improving the demo

This demo code works, but there are a few issues with it. First, it seems wasteful to keep creating and releasing the COM component. Second, the reinterpret_cast in the call to CoCreateInstance is a vulnerability in future versions of your code. What if you change the IID you pass to CoCreateInstance, so it gives you an entirely different interface pointer, but you continue to keep that interface pointer in an ISpVoice?

Using a cached interface pointer

To solve the first problem, move the interface pointer, pVoice, to a member variable of App. You can then call CoCreateInstance once, and Release once, rather than each time the button is clicked. This should improve the performance of your application. You can use a lazy initialization if you like – until someone clicks the button, don’t call CoCreateInstance. You need to be sure that the call to Release happens before the call to CoUninitialize(). The best place to do this is in the handler for the WM_DESTROY message.

App gains a private member variable and a public Speak() method to be called fron WndProc. Move the old contents of the IDB_SPEAK case into the Speak() method. You will also need to move the include of sapi.h into App.h so that the interface pointer type is known to the compiler. Add another public method, Cleanup, to App.

Here are the additions to the App class:

class App { public: //. . . void Speak(); void Cleanup(); private: //. . . ISpVoice* pVoice; //. . . The constructor should initialize pVoice to NULL so that you can test whether CoCreateInstance has been called already: App::App() : pVoice(NULL) { }

Here are the Speak and Cleanup methods:

void App::Speak() { if (!pVoice) { HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, reinterpret_cast<void**>(&pVoice)); } pVoice->Speak(L"Hello from COM.", 0, NULL); } void App::Cleanup() { if (pVoice) { pVoice->Release(); pVoice = NULL; } }

The case statement for the IDB_SPEAK command becomes only:

case IDB_SPEAK: pApp->Speak(); break;

And the case statement for the WM_DESTROY message must call Cleanup:

case WM_DESTROY: pApp->Cleanup(); PostQuitMessage(0); break;

Now the application should perform better when the button is clicked multiple times. You may even notice a difference between the first time you click it and subsequent times.

Helpful macros

To make the call to CoCreateInstance a little more robust, it would be nice to eliminate the <reinterpret_cast>. Many errors of thought are prevented by good type checking in C++, and throwing away type checking always feels a little dangerous. If in a future version, the IID you pass doesn’t match the type of the interface pointer you declared (and pass as the last parameter) the compiler will not be able to warn you.

The IID_PPV_ARGS macro prevents this problem by eliminating the need to pass the IID explicitly. This macro determines what interface ID to ask for by looking at the type of the interface pointer you are passing. You use it in place of the last two parameters (the interface id, or IID, and the pointer-to-pointer-to-void, or PPV, arguments) to CoCreateInstance. In this demo, the revised call to CoCreateInstance looks like this:

HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_PPV_ARGS(&pVoice));

The macro expands to the same two parameters as before, but because it determines the IID for you, you can’t later change the IID and forget to change the type of the interface pointer, pVoice, or vice-versa.

Memory Issues

This simple example uses a hardcoded string. In any real application, you would more likely be using a string that was kept in memory of some kind, whether a classic char* C-style string or an STL string. Unlike a literal string such as “Hello from COM”, these more realistic cases require you to manage memory and object lifetimes when passing them around.

For example, if you call a COM method that will give you a string, the COM component allocated the memory for the string, and may expect you to free it. In other circumstances the COM component may expect you to allocate the memory, though this is less common for strings since the calling code doesn’t know the size that is needed. You need to know about three COM functions that manage memory:

CoTaskMemAlloc – allocates memory for use by a COM object
CoTaskMemRealloc – reallocates (expands) memory for use by a COM object, possibly changing the pointer as a result
CoTaskMemFree – frees memory allocated by one of the other two functions

You cannot usually work out for yourself how memory will be managed by a COM object. You need to check the documentation. For example, the GetDisplayName method of the IShellItem interface retrieves a string that represents a file name. The remarks in the documentation for IShellItem::GetDisplayName remind you:

It is the responsibility of the caller to free the string pointed to by ppszName when it is no longer needed. Call CoTaskMemFree on *ppszName to free the memory.

When you see a reminder like this, it results in code similar to this:

IShellItem *pItem; //get pItem somehow, with CoCreateInstance or some other call if (SUCCEEDED(hr)) { PWSTR pszFilePath; hr = pItem->GetDisplayName(SIGDN_FILESYSPATH, &pszFilePath); if (SUCCEEDED(hr)) { MessageBox(NULL, pszFilePath, L"File Path", MB_OK); CoTaskMemFree(pszFilePath); } pItem->Release(); }

The call to CoTaskMemFree will clear away the memory that the shell item allocated for the string. If you omit it, the memory will never be cleaned up, even when the COM component is released. It’s relying on your calling code to manage the lifetime of that memory.

Notice also that the COM methods do not work directly with STL objects of any kind, including strings. You can convert back and forth among the different string types, but if you’re just going to get a string from a COM call and then pass it to a Windows API call like MessageBox, it’s simpler to keep it in the same format (in this case, PWSTR which is just a pointer to a WSTR, a wchar* null terminated string.)

Realistic uses of COM

The Hilo sample application is a large demo that manages photos, including uploading them to Flickr using web services. It shows off a number of Windows 7 features and, like the demos in this material, does not use ATL or MFC. It includes a class called JumpList that represents a Windows 7 jumplist of documents or other “destinations” an application has opened recently, and tasks a user might want to launch quickly:

Take a look at some of the code for JumpList (you can download all of Hilo from https://code.msdn.microsoft.com/Hilo ) and you’ll find a similar structure to the speaking code just walked through. What may have at first glance seemed unreadable is in fact following a set of idioms and patterns that you may now understand.

// // Customize the jump list and add a user task to Task category // HRESULT JumpList::AddUserTask(const wchar_t* applicationPath, const wchar_t* title, const wchar_t* commandLine) { // Create a destination list HRESULT hr = CoCreateInstance(CLSID_DestinationList, nullptr, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&m_destinationList)); // Set application Id if (SUCCEEDED(hr)) { hr = m_destinationList->SetAppID(m_appId.c_str()); } // Begin to build a session for the jump list if (SUCCEEDED(hr)) { UINT cMinSlots; hr = m_destinationList->BeginList(&cMinSlots, IID_PPV_ARGS(&m_objectArray)); } // Add User tasks if (SUCCEEDED(hr)) { hr = CreateUserTask(applicationPath, title, commandLine); } // Commit list if (SUCCEEDED(hr)) { hr = m_destinationList->CommitList(); } return hr; }

Start by reading the line that calls CoCreateInstance. You can see that this code is working with a class whose class ID is CLSID_DestinationList. It’s creating the COM component in process, and it’s using the IID_PPV_ARGS macros to make sure it requests the right IID. If you search for the CLSID, you’ll find the topic ICustomDestinationList Interface which explains how this interface works and the purpose of the SetAppID, BeginList, and CommitList methods. You can even see a conversion from the STL string m_appId to a char* string to pass to SetAppID.

COM programming uses a number of conventions that are quite opaque when you first meet them. They recur from application to application, whether it’s a single-purpose demo that speaks a hardcoded string, or a rich and complex application like Hilo. Once you learn the basic patterns, you can apply them throughout your Windows programming, whenever the capability you want to use is exposed through COM.

Summary

This chapter has shown how to work with COM to access functionality that is offered by Windows. Because the header files are included with the Windows SDK, and the DLLs that implement them are distributed with Windows (in fact, are part of Windows), it’s really easy to use COM this way. You may have read some “introduction to COM” material that looked a lot more complicated, but when you want to access Windows functionality, COM isn’t as hard to use as you might have thought it was.