Visual Basic Concepts
ANSI, DBCS, and Unicode: Definitions
Visual Basic uses Unicode to store and manipulate strings. Unicode is a character set where 2 bytes are used to represent each character. Some other programs, such as the Windows 95/98 API, use ANSI (American National Standards Institute) or DBCS to store and manipulate strings. When you move strings outside of Visual Basic, you may encounter differences between Unicode and ANSI/DBCS. The following table shows the ANSI, DBCS, and Unicode character sets in different environments.
|Environment||Character set(s) used|
|32-bit object libraries||Unicode|
|16-bit object libraries||ANSI and DBCS|
|Windows NT API||Unicode|
|Automation in Windows NT||Unicode|
|Windows 95/98 API||ANSI and DBCS|
|Automation in Windows 95/98||Unicode|
ANSI is the most popular character standard used by personal computers. Because the ANSI standard uses only a single byte to represent each character, it is limited to a maximum of 256 character and punctuation codes. Although this is adequate for English, it doesn't fully support many other languages.
DBCS is used in Microsoft Windows systems that are distributed in most parts of Asia. It provides support for many different East Asian language alphabets, such as Chinese, Japanese, and Korean. DBCS uses the numbers 0 – 128 to represent the ASCII character set. Some numbers greater than 128 function as lead-byte characters, which are not really characters but simply indicators that the next value is a character from a non-Latin character set. In DBCS, ASCII characters are only 1 byte in length, whereas Japanese, Korean, and other East Asian characters are 2 bytes in length.
Unicode is a character-encoding scheme that uses 2 bytes for every character. The International Standards Organization (ISO) defines a number in the range of 0 to 65,535 (216 – 1) for just about every character and symbol in every language (plus some empty spaces for future growth). On all 32-bit versions of Windows, Unicode is used by the Component Object Model (COM), the basis for OLE and ActiveX technologies. Unicode is fully supported by Windows NT. Although both Unicode and DBCS have double-byte characters, the encoding schemes are completely different.
Character Code Examples
Figure 16.4 shows an example of the character code in each character set. Note the different codes in each byte of the double-byte characters.
Figure 16.4 Character codes for "A" in ANSI, Unicode, and DBCS