Visual Basic Concepts

International Sort Order and String Comparison

String comparison is widely used in Visual Basic. Using this functionality, however, may yield incorrect results if you overlook certain programming requirements.

Sorting Text

Sorting text means ordering text according to language conventions. Format and font are irrelevant to the sorting process because both involve presentation rather than content. At first glance, sorting text looks simple: a precedes b, b precedes c, and so on. However, there are many languages that have more complex rules for sorting. Correct international sorting is not always a simple extension of sorting English text, and it requires a different understanding of the sorting process.

Correct international sorting can imply context-sensitive sorting. Character contraction and expansion are the two important areas of context-sensitive sorting.

  • Character contraction occurs when a two-character combination is treated as a single, unique letter. For example, in Spanish the two-character combination ch is a single, unique letter and sorts between c and d.

  • Character expansion occurs in cases where one letter represents one character, but that one character sorts as if it were two. For example, ß (eszett) is equivalent to ss in both German/Germany and German/Switzerland locales. However, ß is equivalent to sz in the German/Austria locale.

Before implementing the sorting order, you must consider code pages. A code page is an ordered character set that has a numeric index (code point) associated with each character. Because there are various code pages, a single code point might represent different characters in different code pages. While most code pages share the code points 32 through 127 (ASCII character set), they differ beyond that. Typically, the ordering of any additional letters in these code pages is not alphabetic.

For More Information   See "DBCS Sort Order and String Comparison" later in this chapter for more information about working with East Asian languages.

String Comparison in Visual Basic

String comparison rules are different for each locale. Visual Basic provides a number of tools, such as Like and StrComp, which are locale-aware. To use these effectively, however, the Option Compare statement must first be clearly understood.

Comparing Strings with the Option Compare Statement

When using this statement, you must specify a string comparison method: either Binary or Text for a given module. If you specify Binary, comparisons are done according to a sort order derived from the internal binary representations of the characters. If you specify Text, comparisons are done according to the case-insensitive textual sort order determined by the user's system locale. The default text comparison method is Binary.

In the following code example, the user enters information into two input boxes. The information is then compared and sorted in the appropriate alphabetic order.

Private Sub Form_Click ()
Dim name1 As String, name2 As String
   name1 = InputBox("Enter 1st hardware name here:")
   name2 = InputBox("Enter 2nd hardware name here:")
If name1 < name2 Then
   msg = " ' " & name1 & " ' comes before ' " & _
   name2 & " ' "
Else
   msg = " ' " & name2 & " ' comes before ' " & _
   name1 & " ' "
End If
   MsgBox msg
End Sub

If this code is run in an English/U.S. locale, the message box will contain the following output if the user enters printer and Screen:

'Screen' comes before 'printer'

This result is based on the fact that the default text-comparison method is Binary. Because the internal binary representation of uppercase S is smaller than the one for lowercase p, the conditional statement Screen < printer is verified. When you add the Option Compare Text statement in the Declarations section of a module, Visual Basic compares the two strings on a case-insensitive basis, resulting in the following output:

'printer' comes before 'Screen'

If this code is run in a French/Canada locale, the message box will contain the following output if the user enters imprimante and écran:

'imprimante' comes before 'écran'

Similarly, if you add the Option Compare Text statement to your code, the two terms will appear in the right order — that is, écran will precede imprimante. In addition to being case insensitive, the comparison takes into account the accented characters, such as é in French, and places it right after its standard character — in this case, e, in the sorting order.

If the user had entered ecran and écran, the output would be:

'ecran' comes before 'écran'

For More Information   See "Option Compare Statement" in the Language Reference.

Comparing Strings with the Like Operator

You can use the Like operator to compare two strings. You can also use its pattern-matching capabilities. When you write international software, you must be aware of pattern-matching functions. When character ranges are used with Like, the specified pattern indicates a range of the sort ordering. For example, under the Binary method for string comparison (by default or by adding Option Compare Binary to your code), the range [A – C] would miss both uppercase accented a characters and all lower-case characters. Only strings starting with A, B, and C would match. This would not be acceptable in many languages. In German, for instance, the range would miss all the strings beginning with Ä. In French, none of the strings starting with À would be included.

Under the Text method for string comparison, all the accented A and a characters would be included in the interval. In the French/France locale, however, strings starting with Ç or ç would not be included, since Ç and ç appear after C and c in the sort order.

Using the [A – Z] range to check for all strings beginning with an alphabetic character is not a valid approach in certain locales. Under the Text method for string comparison, strings beginning with Ø and ø would not be included in the range if your application is running in a Danish/Denmark locale. Those two characters are part of the Danish alphabet, but they appear after Z. Therefore, you would need to add the letters after Z. For example, Print "øl" Like "[A-Z]*" would return False, but Print "øl" Like "[A-ZØ]*" would return True with the Option Compare Text statement.

Comparing Strings with the StrComp Function

The StrComp function is useful when you want to compare strings. It returns a value that tells you whether one string is less than, equal to, or greater than another string. The return value is also based on the string comparison method (Binary or Text) you defined with the Option Compare statement. StrComp may give different results on the strings you compare, depending on the string comparison method you define.

For More Information   See "DBCS Sort Order and String Comparison" later in this chapter for more information about comparing strings in East Asian languages. See also "StrComp Function" in the Language Reference.