he benefits of writing applications that support Unicode include enhanced international and multilanguage support, which is increasingly important in today's global economy. While Microsoft has encouraged Unicode support, this was not an easy goal for developers because Windows® 95, Windows 98, and Windows Me do not have native Unicode support (in contrast to Windows NT®). However, as Unicode continued to gain popularity, developers needed to write applications for their customers (many of whom were still on Windows 9x), and it became obvious that a type of translation layer between Unicode APIs on Windows NT and the Windows 9x platforms would be required. As a result, the Microsoft Layer for Unicode (MSLU) on Windows 95, Windows 98, and Windows Me systems was created, and released at the same time as Windows XP RC1.
As of this writing, MSLU can only be used with Visual C++® 7.0. A forthcoming version of the Platform SDK may provide support for Visual C++ 6.0.
Design Criteria for MSLU
The major goal of the MSLU team was to write a Unicode layer for these Windows platforms that would extend the Windows NT set of Unicode APIs on Win32®. Developers could then easily write Unicode applications for Windows 9x with a centralized set of Unicode APIs. Figure 1 shows the architecture surrounding the MSLU. A related and equally important goal of the MSLU team was to promote the adoption of Unicode by enabling developers to write more Unicode applications.
Figure 1 MSLU Architecture
From these goals, we developed a particular set of design criteria, such that MSLU should provide a superset of all APIs needed by the application-specific layers previously written by Microsoft. This set of APIs would be generic enough to be used by a diverse group of developers, and MSLU would be small and easy to use without taking up too many resources. We also made sure MSLU was free of major dependencies (such as components and registration routines) while still working properly with other components. In addition, it needed the ability to be overridden where necessary, and ideally it would not provide any new functionality; that is, it should be a translation layer only.
In designing MSLU, it was important to consider the size of the layer. Although it wraps over 475 APIs, MSLU is less than 165KB. In addition, it does not monopolize system resources by taking up more memory than it needs.
MSLU has no special dependencies other than the operating system functions that it wraps. You do not need to install special versions of any component for it to work properly. No registration routines are necessary, and MSLU does not use the registry in order to store information.
Even though MSLU does not have dependencies, we knew that customer applications might. Not only was it important to support those dependent libraries, but it was also crucial that we didn't cause problems by interfering with other commonly used components. Because of this, a significant amount of work was done to ensure proper support of the Unicode versions of MFC, ATL, and the Visual C++ runtime, and we made sure there would be no corruption of messages from components like RichEdit and the Windows common controls. And, of course, performance was a key issue that we made sure to address.
What MSLU Does and Does Not Address
MSLU was designed to be a translation layer for Windows NT-based Unicode APIs on Windows 9x. The layer, however, is not a complete rewrite of Windows 9x, nor is it some type of Windows NT emulator for the Windows 9x platform. It does not provide support for Unicode-only scripts like Devanagari or Georgian, or for any of the new supplementary characters that have been added to Unicode (specifically UTF-16) as surrogate pairs. It also cannot provide extended international support beyond the platform that it runs on, since it is relying on the operating system for any particular international functionality that developers may want to provide. For example, MSLU does not provide updated versions of the cp_*nls files that contain support for code pages, nor does it add support on Windows 95 for GetLocalInfo LCTypes for which the operating system has no specific information.
Instead, developers should consider MSLU as one part of a whole package of international or Unicode-focused components provided by Microsoft. These components include RichEdit, which provides rich text support and can also be used for more comprehensive support with plain text. Windows common controls provide a standard way to implement many commonly used interface components. Uniscribe allows the support and rendering of complex scripts (writing systems that need additional processing prior to display, such as Arabic, Devanagari, and Thai). GDI+ extends Uniscribe to provide easier developer integration and more consistent rendering for mixed scripts. MLang provides many features such as character encoding, string conversion, and font linking support, and the Text Services Framework (TSF) provides a common framework for text input and natural language technologies.
Since all of these components have a Unicode solution, MSLU does not attempt to provide overlapping functionality. With the use of these other components, you can go a step beyond MSLU's support for Unicode and handle multilingual input, processing, and rendering, so that applications can go beyond the codepage-based support provided by Windows 9x.
For more information on these other technologies, what versions to use, and how to implement and deploy them, go to the MSDN® Online homepage.
MSLU integration has only one required step: simply include the new unicows.lib in the list of libraries to which you link. This LIB file contains the custom loader (discussed later) that MSLU uses in your project.
OK, it's not quite that simple. It is important to remember that every function inside of unicows.lib looks exactly like a function in some other LIB file, such as kernel32.lib or gdi32.lib (that is, libraries that you will likely also be including in the library list). These two versions of the same API are easy to distinguish at runtime when you call them on a Windows 9x platform: the MSLU version will work, and the OS version will fail.
In order to ensure that the application uses the correct function, you need to determine that everything is linked in the right order. The simple rule for the linker is that references are resolved from left to right (and if it cannot find the symbol, it starts back at the beginning, finally using the DEFAULTLIB location if it still cannot find the symbol).
To take advantage of this scheme, you can use the following three-step plan:
Include the following in the link list:
- Include unicows.lib.
Include all of the libraries that MSLU (and your application) might need:
Let's examine these steps and the reasons for them, one at a time.
Perform step 1 to determine that the OS version of the API is not found. This is critical, since the OS version of the call will usually fail, or at best not use the appropriate MSLU behavior. Technically, you can remove libraries from the list that you are not using, but it is important to realize that other LIB files (like those for MFC or ATL) use APIs, and you do not want them to pick up APIs from anything other than unicows.lib. It is therefore safer to include all of them.
Step 2 should require no explanation.
Step 3 will allow MSLU to find the APIs it needs (after all, it often needs to call the OS after it has done its work to convert strings). It will also allow all of the other APIs that MSLU does not wrap to be found by your application. As in step 1, you can remove libraries from this list if you do not use them, but it is safer to include them all, unless you are sure that neither you nor any of your dependent libraries will ever need them.
Experienced developers who feel comfortable with the internals of the linker can devise other ways to proceed here; there are many options. These three steps represent the best practices. If you follow them exactly and never try to insert any LIB files in between them, then the MSLU integration will work properly.
If linking is not implemented properly, some APIs will fail and some will succeed, resulting in a very unstable situation in which your application will not work properly. Thus, it is exceptionally important to implement these steps correctly.
Of course, it is not necessarily easy to diagnose the exact problem. Many tools can be used to determine what functions are imported from each DLL. For example, you can use DumpBin (DUMPBIN.EXE) with the /IMPORTS flag set, or Dependency Walker (DEPENDS.EXE). The problem for MSLU is that neither method recognizes the custom loader, so the result of integrating MSLU will be that all Unicode APIs will appear to have vanished from the list of functions.
This does, however, make an interesting way to determine if you have integrated MSLU properly. The trick is to ensure that none of the layer-provided APIs show up. It is not a perfect solution, but it is relatively easy to spot the difference.
The MSLU Loader
MSLU provides a method to delay the loading of a DLL until it's needed. MSLU does not use the delay-load technology, but rather uses a similar technique to provide the same functionality. The MSLU loader will only load unicows.dll if it is on a Windows 9x platform. If it is not, it simply calls the original operating system versions of the APIs.
This loader therefore gives MSLU a lot of functionality and power, since it actually determines the APIs to call. The loader itself is contained in unicows.lib and is the primary reason that the LIB file is so large (about 2MB) despite how small the DLL itself is. The extra information in the LIB is actually a static version of the same information that other delay-load technologies (such as the Visual C++ DELAYLOAD functionality) dynamically generate.
Before we cause a panic, we should address the concern that MSLU will bloat your applications. Almost all of the code in the LIB is essentially repeated over and over again for each API, and if any type of optimization is done at all, it will strip out that duplicated information to a core set of APIs. This optimization happens because at the level of the linker, every API that takes three parameters looks identical, regardless of whether that API is OpenWaitableTimer, WriteProfileString, or GetStateText. As a result, it can combine a lot of that code during the link phase. The linker will also discard extra information for APIs that aren't used. Therefore, you will find that even large or complex applications that call hundreds of APIs will still increase the total size of your application only slightly. On the complex samples we ran (> 2-3 MB, compiled), the size increased by only about 4KB.
More on the APIs
MSLU covers over 480 APIs: 440 wrapper functions and an additional 40 stub functions that a developer can use to provide his own wrapper. The supported APIs fall into four categories.
"W" suffix APIs This first category is by far the largest. For most APIs, it only requires a simple and generic set of steps:
- Convert all in string parameters from Unicode (usually via the system default code page, CP_ACP, or sometimes to a different code page, depending on other parameters as noted).
- Call the "A" version of the API (supported on Windows 9x).
- If there are any out string parameters, convert them back to Unicode (usually via CP_ACP, or sometimes to a different code page, depending on other parameters, again, as I will explain shortly).
The memory for step 1 is usually allocated on the stack, since that is both faster and easier. Cleanup is automatic when the function returns. In order to provide maximum safety, the function does a quick probe to make sure that there is enough memory available on the stack (thus attempting to avoid stack overflow exceptions). The actual cost of this probe is just an extra 12 CPU instructions, so it is very inexpensive; however, if the probe fails and MSLU has to clean up the stack, this obviously will take a bit longer to complete.
The very first build of unicows.dll was an automatically generated version that simply performed these three steps. From the beginning, however, it was clear that a lot more work needed to be done, since there were many exceptions to this simple plan.
The first exception relates to the times that CP_ACP (the system default code page) is not the code page used to do the conversion. The OS is often expecting that the strings will not be based on the default system code page, but rather on some other code page, corresponding to a locale parameter, the character set of a device context, or the file system settings (as represented by the AreFileApisANSI API).
MSLU's careful adherence to the way the OS handles these cases allows it to provide a limited form of support for non-CP_ACP applications (basically, any time the OS supports it, MSLU supports it, too). An example of this is GetLocaleInfo, which converts string parameters using the code page corresponding to the default system code page of the locale parameter.
MSLU handles a large number of exceptions. In fact, from over 400 APIs in this category, only a little more than half of them can be described so simply; the rest of the exception cases require some further special handling. Some of these exception cases relate to enumeration functions such as EnumDateFormats, which often do not take string parameters, but instead return them to a user-provided callback function.
Other APIs provide specific bug fixes or behavior to emulate the Windows NT platform, which had been provided by one or more of those other 32 layers. An example of this is GetWindowTextLength, whose "A" version never promises to give an exact length on DBCS systems, but whose "W" version does. Other APIs cannot be wrapped at all. The IsBadStringPtr API is a good example, since its job is to test whether the passed-in parameter is a valid string. Converting to call the "A" version will just test the string that MSLU created. If the memory was bad, MSLU would crash trying to convert it, so obviously conversion is not needed or wanted. Yet other APIs needed a lot of extra work in order to properly function. For example, a lot of work had to go into making sure that a call to our CreateWindowExW would create a "faux-Unicode" window. This would be a window that is seen as a Unicode window by MSLU, even though the operating system is quite certain it is not a Unicode window.
Every time the "W" APIs were reexamined during the development process, we found more exceptions, resulting in almost 200 cases that defied simple classification. As a result, a great deal of thought went into each API by the time MSLU was ready to ship. You could almost say that every API had a story!
Unicode functionality Nothing is ever simple, and it quickly became clear that there were quite a few APIs relating to Unicode support that do not have a suffix. For example, IsWindowUnicode is an API that a caller would reasonably expect to return TRUE if MSLU's CreateWindowExW created the window. Not all examples were quite this straightforward. The DDE functions have parameters that specify CP_WINUNICODE or CP_WINANSI and expect string parameters to match whatever that parameter is. MSLU had to wrap functions such as these in order to do the things that a Unicode application would expect them to do.
APIs that enhance user messaging support Although there are only a few functions that relate to user messaging that do not have a "W" suffix, it was important for MSLU to wrap them. This provides a consistent user messaging story for "mixed" applications that expect to work properly with both Unicode and ANSI windows and window procedures. In order to handle this, we had to wrap many functions such as CallWindowProcA and SetWindowLongA to provide that consistency.
This particular category of APIs did force us to skirt that line of "no new functionality" a few times, but it seemed important to provide at least the same level of support for the "A" and "W" mix of these functions as Windows NT does, in order to allow applications to work properly.
APIs that fix bugs other layers have addressed Of course, this category ended up being one of the most controversial ones, since it clearly provides new functionality, which is contrary to one of the core design criteria of MSLU. However, if you keep in mind that one of the other design goals of MSLU was to become a layer that internal groups at Microsoft would use for their Unicode support, it was important to pick up their bug fixes (otherwise they would have no real impetus to consider the change). The most well-known of these API fixes are the many bugs that exist in GDI functions on Windows 9x, which were vigorously addressed by Microsoft Office in the Office 95 days. These became so well-known that many external developers noticed (and used) the functions in mso97.dll and mso9.dll with names like MsoExtTextOutW and MsoGetCharWidthW.
Since there was already code written for the vast majority of these cases, it made sense to pick up a few APIs in this category. The changes were already well-tested and would be needed in order to promote adoption of MSLU within Microsoft.
Overriding Individual API Calls
MSLU provides a simple mechanism to allow a developer to override any individual API for two reasons. First, each of the 32 Microsoft layers had its own unique cases that it targeted. In the end, some of those special cases were not issues we wanted to address in MSLU since they contained bugs on which applications were now depending. MSLU could not realistically require applications to change a great deal of existing code if they were already depending on certain behaviors. Second, the behavior we found in internal Microsoft Windows 9x Unicode solutions is just a small sample of what exists outside of the company, and it was not possible to predict the needs of every Unicode layer.
As a result, a special syntax was provided for overriding any API. Two steps are needed (per API you want to override):
- Write a function with the exact same signature as the original API call.
Add a single line of code to "set the hook" for the override:
For example, if you wanted to override MSLU's LoadCursorW API, you could create your own function, as shown in Figure 2.
Now that you have the MyLoadCursorW function, you can force MSLU to use it by adding the following line immediately after it:
extern "C" FARPROC Unicows_LoadCursorW = (FARPROC)&MyLoadCursorW;
The MSLU loader will only call your function in the Windows 9x case. In the case of Windows NT, the loader will call the appropriate OS version of the API. This provides you with the ability to work around any compatibility issues, problems, or even bugs that you find in MSLU's implementation of an individual API.
We mentioned earlier that there were about 40 APIs that had only stub versions. It was explicitly because of this override functionality that these 40 APIs were stubbed (wrapped but not implemented). A developer could use this feature to support an API that MSLU was not explicitly implementing.
Beyond this, you can even "wrap" the MSLU version by doing your own work to call LoadLibrary and GetProcAddress to retrieve the MSLU function pointer. This is something that we would rarely expect people to use, but there is nothing to keep it from working properly.
On a special note, the MyLoadCursorW function shown in Figure 2 is not as stable as the one MSLU provides; for example, it does not properly handle the case where _alloca fails because it's out of stack space.
The MSLU loader has a very simple method for loading unicows.dll: it makes a single call to LoadLibraryA. It does so without any path, so that if you happen to load it first in your own code, then the loader will pick up the one selected. Of course, there is one catch—we did not want the MSLU to become the next DLL to suffer from the problems that often plague shared components. Therefore, you cannot use unicows.dll from the windows or windows\system directories—this will remove the temptation to place them there. You need to make sure that the loader can find the DLL, usually by having it in your own application's process directory or by calling LoadLibraryA yourself. (Of course, be sure not to call any functions that MSLU wraps first.)
Unfortunately, this approach is not always practical. For example, you may be using a component like MFC that actually has to make many Unicode API calls during its initialization. In order to handle this case, MSLU provides a method to override the loading of unicows.dll itself.
Place the following line of code in your project with global scope to use the function LoadIt:
extern "C" HMODULE (__stdcall *_PfnLoadUnicows)(void) = &LoadIt;
LoadIt should return an HMODULE, and you can get that by making your own LoadLibraryA calls. Again, be sure you do not make any Unicode API calls in this function.
MFC does not handle the failure of certain APIs very well. In fact, an MFC application that has all of its Unicode APIs fail will actually crash as it tries to dereference a NULL pointer in its initialization code. To work around this, one of our early adopters found a brilliant way to deal with this situation. They use the load override just described, and if they are unable to find unicows.dll, they exit the process right away. You can also extend this method to try to get MSLU from a particular location, or ask for your installation CD. (Again, make sure that you do not make any Unicode API calls here!)
Not Sharing Your MSLU
Due to the way LoadLibrary works under Win32, if the DLL is already loaded inside of a process, future attempts to load it, even with a path specified, will fail and the original one will be grabbed. This means that if multiple components in the same process are each using MSLU, the first one to be loaded will provide the DLL for everyone else. This may not be so bad by itself, but could be problematic for developers who went to some length to provide overrides for their particular functions.
It is hard to know how common this scenario will be. This early in the release cycle, it is impossible to determine exactly who will be using MSLU. However, if this does turn out to be a typical scenario, it will be reasonable not to want to depend on the code-writing skills of unknown developers. Therefore, you can take advantage of the behavior of LoadLibrary by renaming unicows.dll to some other name and using the DLL override syntax (given earlier) to load your renamed DLL.
Once this is done, no one else will use your version of MSLU; more importantly, you will not be dependent on anyone else's version. If you have multiple components that use MSLU, you can make sure they all use the same one.
Visual Studio .NET and MSLU
With MSLU coming out in the middle of the Microsoft .NET platform release, many people will wonder if MSLU can be used from C#, Visual Basic® .NET, or managed C++. They'll also want to know how it works with the common language runtime (CLR).
The answer is a cautious yes—but you have to be careful. Here are the issues you should consider if you are working with managed code on the .NET platform and you are thinking about implementing MSLU:
- As best as it can, the CLR tries to make calls to APIs unnecessary, since by their nature, these calls cannot be managed code. Therefore, in many cases you do not need to be using APIs at all.
- The .NET languages do not support the notion of linking to DLLs and functions at compile time. Instead, you use the P/Invoke (Platform Invoke) syntax to do LoadLibrary/GetProcAddress at runtime. Because of this, the MSLU loader is never used and you need to provide your own code to do the important job of the loader (that is, to always call unicows.dll on Windows 9x and always call the operating system on Windows NT).
- The assemblies you create in C#, Visual Basic .NET, or managed C++ keep metadata for every P/Invoke call that you have. As a result, you do not want to add a class or classes that contain over 400 APIs, since this will bloat your assembly. You should only add the APIs that you need and not add the P/Invoke information for every API.
- P/Invoke supports an "Auto" Charset attribute that does the work to call the Unicode API on Windows NT and the ANSI API on Windows 9x. However, it does not support the ability to call a different DLL in the case of Windows 9x, so it cannot be used with MSLU. In addition, if you choose to use the Charset.Auto technique, you will miss out on a large number of bug fixes (discussed earlier) that make MSLU more than just a simple wrapper. Therefore, it is worth considering MSLU.
With that said, let's look at a sample for an API using all three languages. The GetSystemDirectory API is one where you pass a buffer and a buffer size, the API fills the buffer, and returns the number of characters put into the buffer. The managed C++ sample is shown in Figure 3, and the C# sample is shown in Figure 4. Finally, Figure 5 shows the Visual Basic .NET sample.
Figure 6 Tamil Unicode in Window 9x
Notice that despite the minor syntax differences in all three samples, each is doing basically the same thing: performing the work that the MSLU loader does in unmanaged C++ by calling the API that's appropriate for the platform on which you are running. By providing a simple wrapper around the GetSystemDirectory API, SystemAPI.GetSystemDirectory allows full support for GetSystemDirectoryW on all platforms via MSLU. See Figure 6 and Figure 7 for new encoding examples.
Figure 7 Unicode in Windows 98
This example is somewhat contrived, since there really are not any unusual issues on Windows 9x compared to Windows NT, but you should understand the basic idea of how to convert an existing P/Invoke to one that uses MSLU.
Starting with the Windows XP RC1 version of the Platform SDK, all redistributables are no longer in the PSDK itself. Instead, you can download them from the Web (see http://www.microsoft.com/msdownload/platformsdk/sdkupdate/default.htm?p=/msdownload/platformsdk/sdkupdate/psdkredist.htm). You will want to get this unicows.exe redistributable so you can include this small DLL in the setup package for your application. Unicows.exe also contains a PDB file that can be used just as you would use any symbols to help debug your app.
Getting More Information
Having read this far, you have probably been convinced that MSLU is worth looking into. So where can you get more information about this layer?
The Platform SDK documentation is your main source for information. Every single API that the layer supports contains at least a note that the Unicode version of the API is supported by MSLU. If there are additional issues to keep in mind, the API topic will contain that as well. (There are also several topics that contain more general information.)
If you have specific questions about MSLU, the best place to ask questions is in its dedicated newsgroup at news:microsoft.public.platformsdk.mslayerforunicode.
The major goal of MSLU was to make it easier to develop Unicode applications for the Windows 9x platform. This was the case because many developers had requested a Unicode translation layer, and so much new international support in Windows 2000, Windows XP, and beyond depends on the creation of Unicode applications.
By providing this translation layer, another important goal—to further promote the Unicode standard—was met. The future of globalized software rests with Unicode, and MSLU is destined to be an important element in the migration toward this standard.