Language detection and system settings, part 1


Once upon a time, a customer got in touch with us about a font rendering problem, where instead of seeing the characters he expected to see inside his applications, he saw question marks.


- What’s your system locale? we asked.

- I’m in Cleveland, he said, in my upstairs home office.


Some flavor of this story has been going around our group for years, and I’d call it an urban legend if it weren’t for the fact that I’ve observed several flavors of the story myself. The truth is that we have way too many ways for users to indicate their language (and region) preferences. The proliferation of settings not only confuses users but also makes it near impossible for developers to understand which system setting they should use to determine which aspects of the user experiences that they’re trying to create. Over at Go Global there is some documentation (as well as somewhat older stuff on the old globaldev site) that is designed to help users and developers differentiate between user locale, default user locale, system locale, and input locale, but the fact is that these don’t even constitute the full range of settings available to users on several common installations of Windows. A partial list of the settings that users and/or developers are asked to make sense of:


- User locale

- System locale

- Input locale

- Thread locale

- Input locale

- Default location or geoID

- System UI language

- User UI language


On top of these, users may also encounter the browser Accept Language, language settings in Office or other productivity suite software (sometimes including separate settings UI for every application in the same productivity suite), and language and region preference UI exposed to them from various web services (even multiple times across different web services provided by the same publisher). From the user’s perspective, she’s stuck entering the same information over and over again, in UIs that are different enough to be non-intuitive but similar enough in goal to make her wonder why she’s repeating the same task every time she goes to a new website or opens a new application.


And one of the worst parts about this confusing proliferation of settings is that not one of them is reliable to tell you anything about the particular language that a user cares about at any given point in time; the best they can do is give you a ballpark guess as to a user’s typical intentions or behavior. Throw a little multilingualism into the picture, where a user may regularly interact with a computer in more than one language, and things get even more convoluted.


One of the biggest reasons we introduced ELS language detection is to give developers a way to know what language their user cares about much more scenario-specifically; developers of any Windows application on Windows 7 can now find out the user’s active computing language in text input or reading scenarios simply by passing the text to ELS. This means that in many cases, developers can stop using system settings to make swags at user experience (though the user settings may still end up being used as a fallback for language detection, on which more in a future post).


Next up: Which system setting should you use as a fallback to language detection? More to come.