Text details in OData

OData is gaining momentum as a way to share data over the web and inside corporations. There are many nice characteristics of the payload formats and conventions; some of these are explicit in the protocol, like the fact that data items have their type clearly labeled, while others are implicit.

Text for example is subject to all sorts of cultural nuances and personal preferences. For example, the sort order of strings depends on the language they are in, or rather, the language the user chooses to interpret them in. I've written more on this in the past.

OData doesn't define the ordering of strings; that's left up to implementations. In particular, the server gets to control the rules to apply. This is important, because the server should ultimately be in control of how it chooses to spend its time and resources - you can imagine that servers will probably choose their most natural, efficient way of sorting.

The ADO.NET Data Services implementation in particular doesn't dictate anything about how strings are sorted. Ultimately it's left to the underlying store, which in practice is usually going to be either the CLR for in-memory objects or your database server if using the Entity Framework.

For a good example of how your mileage may vary, consider the Netflix OData source as it exists today.

If you navigate to http://odata.netflix.com/Catalog/People, you will see the first entries are 'Richard Donat', 'Scott Wiper' and 'Craig Conin'. Presumably these are sorted by the key of the set, so this is an efficient way of enumerating them.

If you navigate to http://odata.netflix.com/Catalog/People?$orderby=Name, you will see the first entries are 'Dandy Warhols', 'Edward E. Thomas', 'Im-ho Yang'. This is kind of surprising, but if you look at the raw data, you'll see that these have a leading blank character - presumably a data entry error - and they are thus sorting first.

Finally, if you navigate to http://odata.netflix.com/Catalog/People?$orderby=trim(Name), where we are trimming off the leading whitespace, we find the first entry is "Action" Dan Harrington. Yes, the "Action" has quotes, and now that sorts first. All of these decisions are ultimately made by the server, although they match the results of what most libraries will produce.

You can imagine for example a hypthetical Microsoft Zune data service with artist information that sorts the way the device does, and doesn't consider a leading "The" sorting, so "The Beatles" sorts before "Cyndi Lauper", for exmaple. Again, the server chooses the best fit for the data - OData doesn't mandate that.