Unicode in Visual C++ 2

Glossary

  • ANSI C: The standardized C programming language.
  • Run-time library: Functions included with a C compiler that programs can call to perform various basic operations.

Networks and operating systems are not the only pieces of software that have historically limited text to 7-bit ASCII or 8-bit ANSI characters—development tools have too. Some recent compilers contain run-time library support for the Shift-JIS code page, but they have been difficult to use in non-Japanese parts of the Far East or with other Japanese code pages. In the past, developers had to use localized compilers to get support for plaintext C source files that could handle non-ANSI string literals or comments. On Windows, compiling resource files for the Far East or Middle East has traditionally required localized editions of the Windows resource compiler. You can use the US Windows 3.1 SDK tools to create resources for Central and Eastern Europe, Greece, or Turkey, but you still have to run them on the localized operating system so that editing and resource compiling work properly. Obviously, using a number of compilers and operating systems to create foreign-language products gets complicated, not to mention expensive.

Development tools that support universal, language-independent applications on 32-bit Windows are now available. The resource and message compilers that come with Windows NT 3.5, Windows 95, and Visual C++ 2 compile files into Unicode. The Visual C++ 2 run-time libraries and MFC 3 provide support for ANSI, multibyte, and Unicode text processing. Select the Unicode support option during installation and use full-text search in the online books for explanations on adding Unicode support using the compiler.

Another significant improvement to Visual C++ is that critical run-time string-processing and character-processing functions such as isalpha, toupper, printf, strcoll, atof, and strftime are now locale-sensitive. In addition, the Visual C++ 2 run-time libraries contain multibyte and Unicode versions of these and numerous other functions. Like the Win32 API model, it's possible to access these functions through generic prototypes, which can be resolved in three different ways. (See Figure 3-17.) If no compile-time flags are defined, the generic text functions resolve to SBCS functions. If the compile-time flag _UNICODE is defined (similar to the Win32 UNICODE flag but preceded by an underscore), the generic prototypes resolve to wide-character functions. Finally, if the _MBCS flag is defined, the prototypes resolve to multibyte functions. Visual C++ 2 also defines the generic type _TCHAR, which is analogous to Win32's TCHAR, and the text macro _T, which is analogous to Win32's TEXT macro.

Figure 3-17 Using Visual C++ 2's generic text functions.

The following is an example of code that can be compiled for either ANSI or Unicode using the definitions in the Visual C++ 2 header files and the C run-time library functions:

#define _UNICODE // Remove this line to compile for ANSI.
#include <malloc.h>
#include <string.h>
#include <stdio.h>
#include <tchar.h>
#include <windows.h>

// Sample Program: a generic function
_TCHAR *ReplaceText(_TCHAR *pszStr, _TCHAR *pszSubst)
{
// pszStr must contain at least one non-null character.
_TCHAR *pchStart = pszStr;
while ( *pchStart )
if ( *pchStart++ == _T('\\') )
break;

pchStart[-1] = 0;
_TCHAR *pchEnd = pchStart + _tcslen(pchStart);
while ( --pchEnd >= pchStart )
if ( *pchEnd == _T('\\') )
break;

pchEnd++;
_TCHAR *pszNew = (_TCHAR *)malloc(sizeof(_TCHAR) *
(_tcslen(pszStr) +
_tcslen(pszSubst) +
_tcslen(pchEnd) + 1));
_tprintf(pszNew, _TEXT("%s%s%s"), pszStr, pszSubst, pchEnd);
return pszNew;
}

The table below lists the Visual C++ 2 wide-character functions. All have generic equivalents except mbstowcs, mbtowc, wctomb, and wcstombs. Function names in italics are ANSI/ISO-compliant. Function names that are in boldface are locale-sensitive. Function names that are both in italics and boldface carry both properties. The character classification functions and the conversion functions towlower and towupper behave the same way as their single-byte counterparts in the C-compiler default locale, but they follow the character classifications of GetStringTypeW outside the C-compiler default locale.

Process Control _wexecl, _wexecle, _wexeclp, _wexeclpe, _wexecv, _wexecve, _wexecvp, _wexecvpe, _wspawnl, _wspawnle, _wspawnlp, _wspawnlpe, _wspawnv, _wspawnve, _wspawnvp, _wspawnvpe
   
File/Path _waccess, _wchdir, _wchmod, _wcreat, _wfdopen, _wfindfirst, _wfindnext, _wfopen, _wfreopen, _wfsopen, _wfullpath, _wgetcwd, _wgetdcwd, _wgetenv, _wmakepath, _wmkdir, _wmktemp, _wopen, _wpopen, _wremove, _wrename, _wrmdir, _wsopen, _wsplitpath, _wstat, _wunlink
   
IO _fgetwchar, _fputwchar, _snwprintf, _vsnwprintf, fgetwc, fgetws, fputwc, fputws, fwprintf, fwscanf, getwc, getwchar, getws, putwc, putwchar, putws, swprintf, swscanf, ungetwc, vfwprintf, vswprintf, vwprintf, wprintf, wscanf
   
Character Classification iswalnum, iswalpha, iswascii, iswcntrl, iswctype, iswdigit, iswgraph, iswlower, iswprint, iswpunct, iswspace, iswupper, iswxdigit
   
Character Conversion mbstowcs, mbtowc, towlower, towupper, wcstombs, wctomb
   
String Manipulation _wcsdec, _wcsdup, _wcslwr, _wcsncnt, _wcsnextc, _wcsnset, _wcspnp, _wcsrev, _wcsset, _wcsupr, wcscat, wcschr, wcscpy, wcscspn, wcslen, wcsncat, wcsncpy, wcspbrk, wcsrchr, wcsspn, wcsstr, wcstod, wcstok, wcstol, wcstoul, wcsxfrm
   
String Comparison wcsicmp, _wcsicoll, _wcsncmp, _wcsncoll, _wcsnicmp, _wcsnicoll, wcscmp, wcscoll
   
Numeric/String Conversion _itow, _ltow, _ultow, _wtoi, _wtol
   
Date/Time Functions _wasctime, _wctime, _wstrdate, _wstrtime, _wutime, wcsftime
   
Miscellaneous _wperror, _wsetlocale, _wsystem, _wtempnam, _wtmpnam

You might notice that some locale-sensitive run-time library functions overlap the Win32 API. For example, tolower and toupper correspond to CharUpper and CharLower. C run-time functions are generally simpler and less flexible than Win32 API calls, but together they provide the same kind of international support as the Win32 API. Some C run-time functions actually call the system to retrieve locale-sensitive information. For example, when running on Win32, strcoll calls the API function CompareString. Calling CompareString directly will give you access to more detailed error messages and will remove an indirection. Simple run-time functions don't carry the overhead of calling the system.

One substantial advantage of using the C run times is that they consistently support Unicode, whereas 32-bit Windows does not. You can use the wide-character C run-time function calls to do Unicode programming for Win32s or Windows 95. Similarly, you don't need a special Japanese compiler to create Shift-JIS–based applications for Japan. You don't even need a Japanese development environment—the multibyte C run-time functions are always available. Finally, if your goal is to write code that is portable across platforms, you can limit your calls to the ANSI/ISO C standard run-time functions. Keep in mind that locale-specific results will vary depending on the target system.