Changes to IDN in IE7 to now allow mixing of scripts

Domain names are not limited to ASCII any longer, and as the web is growing more and more domain names now contain characters from other character sets. Such domain names are called Internationalized domain names (IDN), for example http://ايكيا.com is a domain in Arabic for IKEA. IE7 added support for IDN in Beta 2. We listened to your feedback during Beta 2 and we are changing the principles of IDN to accommodate the way customers want to use international characters on the web.

Preventing IDN spoofing by default in IE7 Beta 2

IE7 beta 2 implementation of IDN feature is such that if a user navigates to an IDN URL and if the scripts that are present in the URL are not part of the user’s configured Accept language, IE7 will convert the URL into Punycode and display it in the address bar. IE7 also displays the information bar saying that the website address contains characters which cannot be displayed using the current language settings.

Letters or symbols that cannot be displayed with the current language settings

This design makes IE7 secure by default against any URL spoofing attacks containing non-ASCII characters. In order to view a URL in Unicode format the user must have the language specific to that character script added to the browser’s Accept language.

As discussed previously, another IDN restriction for IE7 Beta 2 was that it did not allow intermixing of scripts for a given label (a label is a segment of a domain name, delimited by dots; contains three labels “www”, “microsoft” and “com”) in a URL. Also, for a given label IE did not allow mixing of non-ASCII scripts with ASCII. This step was mainly taken to protect users against homograph-spoofing attacks. Consider the scenario where a user commonly browses sites with Cyrillic URLs. If the user gets a phishing email to visit where one of the ‘a’s is in ASCII and the other is in Cyrillic, the user might believe they are visiting the real paypal which uses all ASCII characters in their domain name. To protect against this spoof, IE7 will detect the mixed characters and show the URL in Punycode rather than misleading the user.

IDN - displaying URL in punycode

Improving user experience for some mixed script scenarios for IE7

We heard your feedback about how restrictive the feature was by not allowing mixing of ASCII characters with other scripts. For instance, in some locales it is common to have business names that mix ASCII and characters from local languages.

We looked for a way to allow mixed characters in a fragment without introducing the risk of a spoof. The IE team worked with experts from the Windows Globalization team to investigate which scripts can be mixed safely with ASCII characters. 

In the Release Candidate build (post-Beta 3), IE will permit mixing of ASCII with certain scripts and will display the URL in Unicode. However, IE still will not allow intermixing of allowed scripts (list given below) within a label, if they belong to different languages, even though the user has added the language containing the scripts to their Accept Languages.

Consider the following example where a URL label contains Hang and ASCII (website for LG Korea)

IDN - URL containing both Hang (Hangul) and ASCII

IE will now display this URL in Unicode for a user who has added Korean language support, since the non-ASCII script belongs to the Korean language set and is now on the allowed list of scripts. However, IE will show the raw Punycode encoding for a user who has not added Korean language support.

Here is a list of scripts that IE will permit to mix with ASCII

  • Arab (Arabic),
  • Bali (Balinese),
  • Beng (Bengali),
  • Bugi (Buginese),
  • Deva (Devanagari),
  • Ethi (Ethiopic),
  • Gujr (Gujarati),
  • Guru (Gurmukhi),
  • Hang (Hangul),
  • Hani (Han),
  • Hebr (Hebrew),
  • Hira (Hiragana),
  • Kana (Katakana),
  • Khmr (Khmer),
  • Knda (Kannada),
  • Laoo (Lao),
  • Mlym (Malayalam),
  • Mong (Mongolian),
  • Mymr (Myanmar),
  • Orya (Oriya),
  • Sinh (Sinhala),
  • Syrc (Syriac),
  • Taml (Tamil),
  • Telu (Telugu),
  • Thaa (Thaana),
  • Thai (Thai),
  • Tibt (Tibetan)

In summary, you told us how you planned to use the feature and we listened. We’re very excited that we were able to make this change to allow richer domain names for international sites!

Tariq Sharif
Program Manager