So what's the deal with this whole C# 3.0 / Linq thingy?

Article
09/13/2005

I've been mulling over the best way to talk about the new C# 3.0 stuff we've been working on. I presented the post on how you could use the new C# 3.0 features to go beyond the basic query functionality we've been targetting it at. The was to help give an appreciation about how we've added strong query support through the addition of several new smaller features that can be used for more than query (although that's the formost area that we're trying to attack). However, i then realized that it was somewhat interesting that i would present the post on "what *else* you can do with C# 3.0" before anyone even had idea of what you "could" did with C# 3.0 first.

I could do a fairly detailed drill down of the new C# features, but i actually thought a more holistic approach would be better in this case. So i'm actually going to talk about the general problem space we're confronting, and i'll try to provide some running examples to help carry me through this.

So what is Linq? Well, Linq is the culmination of a number of techniques we're producing to help deal with the large disconnect between data programming and general purpose programming languages. Linq stands for Language INtegrated Query, and simply put, it's about taking query, set operations and transforms and making them first class concepts in the .Net world. This means making them available in the CLR, in .Net programming languages, and in the APIs that you're going to be using to program against data in the future. Through all this you can get a completely unified query experience against objects, XML, and relational data. i.e. the most common forms of data that will appear in your application. And, what's best, if you happen to have your own form of data that doesn't fit into those different models, then you can use our extensible system to target that model as well. After all, our XML and relational data access models (called XLinq and DLinq respectively) are just APIs built on top of the core Linq infrastructure. As such, i'm not going to dive too deeply into those specific models. I'm going to let the individual teams who are responsible for that (and who know those APIs far more intimately) to give you all the information at their disposal.

So, let's first talk about data access today and how our new approach most likely differs from that you've been used to. If you're accessing a database somewhere in your application, then there's a good chance that you've embedded some bit of SQL somewhere. Maybe you've kept it fairly clean and abstracted away, or maybe you have SqlCommand's left rigth and center all with their own "select *"'s or other raw SQL commands stored hither. Of course, when writing this code you had no compile time checking that your SQL strings were well formed, no IntelliSense, etc. Because, effectively, you are using two completely different languages in an environment that only understands one. This is pretty bad, but really only begins to scratch the surface of the deep mismatch between this relational data domain and the object domain.

Through and through you have mismatches between objects and relational data and XML in your system. Different types. Different operations. Different programming models. Your code which works on XML won't work on relational data. You code which works on relational won't work on objects. etc. But there's a better way. Now we can allow you to work with all these different data systems right within C# (or VB). This means using the same syntax, the same types, and the same programm ing models to query and manipulate all these different forms of data in a unified manner. And, because support for these models has been built on top of an extensible system, it means that if necessary you can do the same as what we've done to bring this strong query support anywhere you need to it go where we don't currently have an offerring.

To ground this discussion a little, let's start looking at a simple example of C# 3.0/Linq in action. (Note: this example might look very familiar. That's because many demos and examples are made to run against the Northwind DB. This allows us to all talk about the same thing and have consistent and clear names for entities). You start with a simple list of Customers:

         Customer[] customers = GetCustomers();

Nothing magic going on here. Nothing up my sleaves. Just a regular .Net array initialized from some source. Now, to make things a little simpler (especially for later examples) we can then write that as:

         Customer[] customers = GetCustomers();
        var custs = customers;

What's going on in that second line? Well, "var" is are way of introducing "local variable type inference". It's a new C# 3.0 feature that allows you to save space by not writing the type of a local variable, while also having the type inferred from the expression that initializes the variable. So, in the above code, "custs" is known at compile time to be a "Customer[]". If you were to write:

         var i = 10;
        var b = true;
        var s = "hello";

then it would be the *exact* same as writing:

         int    i = 10;
        bool   b = true;
        string s = "hello";

We'll see later on why this can be quite a handy thing. Now, let's extend our code a bit further to start querying that array of customers:

         Customer[] customers = GetCustomers();
        var custs = customers.Where(c => c.City == "Seattle");

Here we're simplying querying all our customers for the set of customers that are from Seattle. And "custs" will be an IEnumerable<Customer>. We can even carry that a little further in to the following query:

         Customer[] customers = GetCustomers();
        var custs = customers.Where(c => c.City == "Seattle").Select(c => c.Name);

Here we're projecting out the name of all our customers from Seattle. So custs will be an IEnumerable<string>. Now, what the heck is this code. This isn't your daddy's C# anymore. What are those funky arrows? And where did the "Where" and "Select" methods come from?? They're certainly don't seem to be defined on array type when i look at it in ILDasm! Well, to answer the first question, the funky => arrow the new C# 3.0 syntax that allows you to create a lambda expression. You can think of a lambda expression as a natural evolution of the anonymous methods introduced in C# 2.0. Lambda expressions benefit from simpler syntax and the ability to use inference. So now you can write:

         c => c.City == "Seattle"  //instead of
        delegate (Customer c) { return c.City == "Seattle"; }

As you can see, the C# 2.0 method just drowns you in syntax and it makes it a rather poor choice to use in queries (heck! there's a 2x increase in query size between the two). However, the new C# lambda expression succitly encapsulates the test we want to perform, with only about 5 characters overhead.

That answers the first question, but what about the second? Where, oh where did "Where" come from? This is an example of another new C# 3.0 feature we call "extension methods". Extensions are a way to allow you to add operations to existing types that aren't under your control. While that may give you the heebie-jeebies, rest assured, you're not actually modifying the actual type. Rather, you're being allowed to use succint syntax to in effect execute a method as if it existed on this type. Specifically, extension methods are static methods that look like so:

 namespace System.Query {
    public static class Sequence {
        public static IEnumerable<T> Where<T>(this IEnumerable<T> e, Predicate<T> p) {
            foreach (T t in e) {
                if (p(t)) {
                    yield return t;
                }
            }
        }
    }
}

This declares an "extension method" on the IEnumerable<T> type. When you import the namespace by writing "using System.Query", you now gain the ability to call teh "Where" method on anything that implements IEnumerable<T> (like Arrays). With these extension methods we can now compose powerful query functions together to manipulate data easy.

So at this point we've seen three new C# 3.0 features that can be used together to build a powerful base for querying objects. In future posts i'll include information about the rest of the new language features, and i'll give a more comprehensive view of how sophisticated our query support is.

So what's the deal with this whole C# 3.0 / Linq thingy?

Additional resources