Windows Confidential: The Evolution of Sorting
The development of sorting algorithms in Windows Explorer has a long and slightly tangled history.
Windows Explorer started out with a very simple sorting algorithm. It just sorted file names according to the “lstrcmpi” function. This performed a locale-specific, case-insensitive comparison. Mind you, even this comparison function was rather complicated at the time. For example, it gave special treatment to hyphens and apostrophes.
As a result, when Windows 95 sorted file names for display in Windows Explorer, it would put file139 ahead of file20. This is perfectly logical from a computer programmer’s point of view. It’s also completely counter-intuitive to normal human beings—thus proving that computer programmers are not normal human beings.
In Windows XP, Microsoft updated the Windows Explorer sorting algorithm to be more in line with what normal human beings expect. It treated digits in file names as numbers instead of sequences of characters. While this improved sorting for most people, there were cases where this change resulted in some surprises.
One example that recurred frequently was file names containing hex values. Under the new number-based sorting, Windows Explorer assumed that a file named “1040A” should sort slightly after “1040,” and nowhere near a file named “103F2.” If for whatever reason you’re in the habit of viewing folders full of files whose names are hex values, you can set the policy “Turn off numerical sorting in Windows Explorer.” This policy changes sorting back to the way it was in versions of Windows prior to Windows XP—namely, character-by-character.
Periods and Spaces
In Windows Vista, there was a tiny tweak made to the comparison algorithm, because periods serve double duty. They’re traditional characters by day, but they also serve as file extension markers by night. As a result, a file named File 1.txt ended up being counter-intuitively sorted after File.txt because the period was being compared against the space and losing.
Microsoft added a new rule to the sorting algorithm in Windows Vista so periods are treated as sorting before spaces instead of after them. If you don’t like this sub-rule, you can disable it by setting NoDotBreakInLogicalCompare, but only in Windows Vista. The setting has no effect on Windows 7.
There’s another, less-strange case in which character-by-character sorting may be preferable: if you have file names with floating-point numbers in them. The number-based sorting algorithm would put “1.5” ahead of “1.25.” This is correct if the “1.5” and “1.25” refer to a numbered hierarchy, such as a section of legal code. Title 1, chapter 5 would come before title 1, chapter 25.
On the other hand, if “1.5” and “1.25” refer to the dimensions of a machine part in centimeters, then you’d expect “1.25” to come before “1.5.” Because Windows Explorer doesn’t have enough context to know whether any particular string of digits after a decimal point is a hierarchical number or a floating point number, it needs your help.
A more subtle change to sorting algorithms was introduced in Windows 7. One customer observed that if he had two files, labeled something like file1.txt and file2.txt, the files appeared in that order if he sorted the folder by Type, which is expected behavior.
If he clicked the Type header a second time to reverse the sort, earlier versions of Windows would continue to place file1.txt ahead of file2.txt. However, the file switches places in Windows 7, with file2.txt coming ahead of file1.txt. The customer was confused by this change in behavior because it looked “unreasonable.”
The Type column understands that it’s common for many items to have the same Type, so it indicates that ties should be broken by the item’s name. In addition to making sorting by Type a bit more consistent, it also means that when you reverse the sort order, the items within each Type also reverse order. This is reasonable behavior for many users, because they expect that toggling a column will reverse the set of items.
So the next time you run a sort, remember that while sorting sounds easy and straightforward, there’s a lot going on behind the scenes..
Raymond Chen*’s* Web site, The Old New Thing, and identically titled book (Addison-Wesley, 2007) deal with Windows history, Win32 programming and giant metal chickens.