Visual Basic Concepts

Article
01/04/2007

ANSI, DBCS, and Unicode: Definitions

Visual Basic uses Unicode to store and manipulate strings. Unicode is a character set where 2 bytes are used to represent each character. Some other programs, such as the Windows 95/98 API, use ANSI (American National Standards Institute) or DBCS to store and manipulate strings. When you move strings outside of Visual Basic, you may encounter differences between Unicode and ANSI/DBCS. The following table shows the ANSI, DBCS, and Unicode character sets in different environments.

Environment	Character set(s) used
Visual Basic	Unicode
32-bit object libraries	Unicode
16-bit object libraries	ANSI and DBCS
Windows NT API	Unicode
Automation in Windows NT	Unicode
Windows 95/98 API	ANSI and DBCS
Automation in Windows 95/98	Unicode

ANSI

ANSI is the most popular character standard used by personal computers. Because the ANSI standard uses only a single byte to represent each character, it is limited to a maximum of 256 character and punctuation codes. Although this is adequate for English, it doesn't fully support many other languages.

DBCS

DBCS is used in Microsoft Windows systems that are distributed in most parts of Asia. It provides support for many different East Asian language alphabets, such as Chinese, Japanese, and Korean. DBCS uses the numbers 0 – 128 to represent the ASCII character set. Some numbers greater than 128 function as lead-byte characters, which are not really characters but simply indicators that the next value is a character from a non-Latin character set. In DBCS, ASCII characters are only 1 byte in length, whereas Japanese, Korean, and other East Asian characters are 2 bytes in length.

Unicode

Unicode is a character-encoding scheme that uses 2 bytes for every character. The International Standards Organization (ISO) defines a number in the range of 0 to 65,535 (2¹⁶ – 1) for just about every character and symbol in every language (plus some empty spaces for future growth). On all 32-bit versions of Windows, Unicode is used by the Component Object Model (COM), the basis for OLE and ActiveX technologies. Unicode is fully supported by Windows NT. Although both Unicode and DBCS have double-byte characters, the encoding schemes are completely different.

Character Code Examples

Figure 16.4 shows an example of the character code in each character set. Note the different codes in each byte of the double-byte characters.

Figure 16.4 Character codes for "A" in ANSI, Unicode, and DBCS