Categories of Internationalization for Windows

Glossary

  • Single-byte character set (SBCS): A character encoding in which each character is represented by 1 byte. Single byte character sets are mathematically limited to 256 characters.
  • Multibyte character set (MBCS): A mixed-width character set, in which some characters consist of more than 1 byte. A double byte character set (DBCS), which is a specific type of multibyte character set, includes some characters that consist of 1 byte and some characters that consist of 2 bytes.
  • Bidirectional (BiDi) text: A mixture of characters that are read from left to right and characters that are read from right to left. Most Arabic and Hebrew characters, for example, are read from right to left, but numbers and quoted Western terms within Arabic or Hebrew text are read from left to right.
  • Latin script: The set of 26 characters (A–Z) inherited from the Roman Empire that, together with later character additions, is used to write languages throughout Africa, the Americas, parts of Asia, Europe, and Oceania. The Windows 3.1 Latin 1 character set covers Western European languages and languages that use the same alphabet. The Latin 2 character set covers Central and Eastern European languages.
  • Traditional Chinese: The set of Chinese characters, used in such countries/regions as Hong Kong SAR, China Singapore, and Taiwan, that is consistent with the original form of Chinese ideographs that are several thousand years old.
  • Simplified Chinese: The Chinese alphabet used in the People's Republic of China. It consists of several thousand ideographic characters that are simplified versions of traditional Chinese characters.
  • Software Development Kit (SDK): A set of tools and libraries for creating software applications for Windows operating systems.
  • Device Driver Kit (DDK): A set of tools and libraries for creating Windows based software that runs hardware devices.

Software engineering requirements for neighboring countries/regions are often quite similar. In fact, three broad geographical groups cover almost all markets for Windows-based applications: the Middle East; the Far East; and Europe, Russia, and the Americas. Before you begin planning a new Windows-based product or decide which international markets to target for an existing product, examine the development issues for each of the categories listed in Figure 1-3.

Language
Edition
Languages Character
Set Type
Scripts Tex
Directionality
SDK/DDK
Versions
Other
Issues
European Western European,
Central and Eastern European,
Greek, Russian, Turkish, Indonesian
Single byte Latin, Greek, Cyrillic Left to right English Windows SDK and DDK  
             
             
Middle Eastern Arabic, Hebrew Single byte Arabic, Hebrew, Latin Bidirectional Windows SDK and DDK supplement  
             
             
Far Eastern Traditional Chinese,
Simplified Chinese,
Japanese, Korean
Multibyte Kana, hangul, ideographic characters Horizontal and vertical Windows SDK Input methods
             
             
Thai Thai Single byte Thai Left to right Windows SDK and DDK supplement Text layout

Figure 1-3 Categories of internationalization for Windows, based on development issues.

If you plan intelligently, once you have a solid core feature and a code base for one or two languages in a particular category, you might determine that the cost of developing another language edition in that category is small compared with the potential return. A good example is the Far East* category (double-byte languages).

Applications written for the Chinese, Japanese, and Korean editions of Windows share common development issues. Once you have a Japanese language application, the development steps necessary to create an application for the booming Korean market are minimal.

The Windows 95 development team built localized editions of the operating system from separate code bases that correspond to the categories listed in Figure 1-3. There were no code differences between language editions of Windows 95 in each of these categories—only the language of the user interface changed. The Middle Eastern, Far Eastern, and Thai code bases are all supersets of the European code base. (Applications written for single-byte editions of Windows will run on bidirectional or double-byte editions of Windows, but the reverse is not necessarily true.) In fact, to add Middle Eastern functionality to a European edition of Windows, you need only install several additional libraries. Windows 3.x had different code bases for Western European languages, Central and Eastern European languages, Greek, Turkish, Middle Eastern languages, Far Eastern languages, and Thai. The goal for future versions of Windows is to have a single code base for all languages.

Some language editions of your product may require changes only in packaging and small software components, such as spell-checkers or sample documents, to be marketable in more than one region. English products are a prime example. As long as a program's design is not culturally biased and its features have been properly internationalized, it can be sold in Australia, Britain, Canada, Hong Kong SAR, China, India, Ireland, New Zealand, South Africa, the United States, and many other countries/regions. Keeping this in mind during the planning stages will help you produce a single program executable that can be shipped worldwide.

Spanish is another language that is spoken widely. With few changes, a Spanish-language program can be shipped throughout Latin America, Spain, and the United States. Though local dialects exist in many Spanish-speaking countries/regions, computer users generally accept software that has been translated using a core vocabulary. Windows supports a number of languages that are used in multiple locales, including the following:

  • Arabic
  • Italian
  • Chinese
  • Norwegian
  • Dutch
  • Portuguese
  • English
  • Romanian
  • French
  • Russian
  • German
  • Spanish

Appendix P lists all localized editions of Windows.

* Although the term "Far East" is no longer as commonly acceptable as it once was, Microsoft still uses it when referring to language versions of Windows that require double-byte character sets.