Evolution of a C# Query—Step by step from C# 1.1 to LINQ

Jomo Fisher—The future of C# was recently unveiled at PDC. Object, XML and relational data will be integrated deeply into the language. This isn’t really a new direction for C#, it’s the next step down a path that C# has always been headed. To see this, let’s look at a simple task—filtering a set of objects—and see how C# language features have made it easier and more natural over time.

Queries in the C# 1.1 Era

We have an Employee class:

class Employee {

    public string Name;

    public int Years;

    public string Department;

}

We want to give everyone with at least four years service a gold watch. Here’s how we might get this set in C# 1.1:

static Employee[] GoldWatch(Employee[] employees) {

  int resultCount = 0;

    for (int i = 0; i < employees.Length; ++i) {

        if (employees[i].Years > 3) {

            ++resultCount;

        }

    }

    Employee[] results = new Employee[resultCount];

    for (int i = 0, j = 0; i < employees.Length; ++i) {

        if (employees[i].Years > 3) {

            results[j] = employees[i];

            ++j;

        }

    }

    return results;

}

This method counts the eligible employees, allocates an array, and fills the array with matches. This works, but you’ve probably identified several problems with it: The employees array is iterated twice, the business rule ‘Years>3’ is written twice, and generally this is more complex than we would expect for such a simple task.

 

Here’s one possible simplification (still in C# 1.1):

static Employee[] GoldWatch(Employee[] employees) {

    ArrayList results = new ArrayList();

    for (int i = 0; i < employees.Length; ++i) {

        if (employees[i].Years > 3) {

            results.Add(employees[i]);

        }

    }

    return (Employee[])results.ToArray(typeof(Employee));

}

We’ve gotten it down to just one pass, but we had to give up some type safety because ArrayList holds ‘object’. One consequence is the complicated cast at the end.

 

Both of the approaches above also have a flexibility problem: They require employees to be given as an array. We could write a similar method that takes ArrayList as a parameter but there are other collection types we may want to support too.

 

Here’s another approach:

static Employee[] GoldWatch(IEnumerable employees) {

    ArrayList results = new ArrayList();

    foreach(Employee employee in employees) {

        if (employee.Years > 3) {

            results.Add(employee);

        }

    }

    return (Employee[])results.ToArray(typeof(Employee));

}

This works with arrays of Employee and other collections like ArrayList. We give up some more type safety, however, because the compiler can’t figure out whether the employees IEnumerable truly contains only Employee instances.

 

We’ve done about as much as we can with C# 1.1.

Queries in the C# 2.0 Era

C# 2.0 brings some new tools that we can use to make our query better. First, let’s attack the type safety problem:

static Employee[]

GoldWatch(IEnumerable<Employee> employees) {

    List<Employee> results = new List<Employee>();

    foreach (Employee employee in employees) {

        if (employee.Years > 3) {

            results.Add(employee);

        }

    }

    return results.ToArray();

}

Generics allow complete type safety while keeping a simple implementation.

 

We still have a potential performance problem, however: We’re building up the entire list before returning. What if the result set is very large? What if the caller only needs to look at the first few items?

 

If we may have these calling patterns and we can tolerate returning an IEnumerable<Employee> rather than an array then there is a better solution using iterators (another feature new for C# 2.0).

static IEnumerable<Employee>

GoldWatch(IEnumerable<Employee> employees) {

    foreach (Employee employee in employees) {

        if (employee.Years > 3) {

            yield return employee;

        }

    }

}

Now there’s no temporary copy of employees and the caller gets to decide how many elements it wants. If the caller asks for just the first element then only the work to return one element is done.

 

This code is simple and will be maintainable because the code looks like what it does.

 

Yeah!

 

But what happens when we need to do a similar query? Say that we later discover that we need to return a list of people in the Sales department. We could code up a new SalesForce() method that looks the same except that it has a different filtering condition. In this case, the two methods look similar enough that we want to try to see if we can factor the code better.

 

We might try using delegates to write a generic ‘Filter’ method:

delegate bool Choose(Employee employee);

static IEnumerable<Employee>

Filter(IEnumerable<Employee> employees, Choose choose) {

    foreach (Employee employee in employees) {

        if (choose(employee)) {

            yield return employee;

        }

    }

}

static bool GoldWatchChoose(Employee employee) {

    return employee.Years>3;

}

static bool SalesForceChoose(Employee employee) {

    return employee.Department=="Sales";

}

static IEnumerable<Employee>

GoldWatch(IEnumerable<Employee> s) {

    return Filter(employees, new Choose(GoldWatchChoose));

}

static IEnumerable<Employee>

SalesForce(IEnumerable<Employee> s) {

    return Filter(employees, new Choose(SalesForceChoose));

}

In this case, every time we have a new type of query we have to add two new methods: the query method and a delegate method used to choose which employees to return. This proliferation of methods is a maintainability problem.

 

We could use anonymous delegates—a C# 2.0 feature:

static IEnumerable<Employee>

GoldWatch(IEnumerable<Employee> s) {

    return Filter(employees,

        delegate(Employee employee) {

            return employee.Years>3;

        }

    );

}

static IEnumerable<Employee>

SalesForce(IEnumerable<Employee> s) {

    return Filter(employees,

        delegate(Employee employee) {

return employee.Department=="Sales";

  }

    );

}

Now we only need to add one new method for every new query type. Also, we have the option of calling Filter() directly. In this case, we wouldn’t need to add any new methods. On the downside, seeing a tiny method implementation passed as a parameter to another method is jarring and perhaps reduces the readability of the code.

 

We’ve done about as much as we can with C# 2.0.

Queries in the C# 3.0 Era

With C# 3.0 we’ll have tools to make our query better.

 

Those anonymous delegates are good, but we’d like something a little simpler and more maintainable. C# 3.0 has the concept of a lambda expression. For our purposes, you can think of a lambda expression as a shortcut for an anonymous delegate. Here are the queries rewritten to use lambda expressions.

static IEnumerable<Employee>

GoldWatch(IEnumerable<Employee> employees) {

    return Filter(employees,

        employee => employee.Years>3

    );

}

static IEnumerable<Employee>

SalesForce(IEnumerable<Employee> employees) {

    return Filter(employees,

        employee => employee.Department=="Sales"

    );

}

 

This is simple and will be maintainable. We still have a few issues, however. You can see one problem when you look at how our queries will be called:

GoldWatch(employees);

SalesForce(employees);

Coming from an OO world, we’re used to seeing using a noun.Verb() calling pattern. Ideally, we want to ask a set of employees for its sales force. Something like:

employees.GoldWatch();

employees.SalesForce();

We might try to define a new class called Employees which inherits from IEnumerable<Employee>. The problem here, though, is that we know our callers want to use other implementations of IEnumerable<Employee> such as Employee[] and List<Employee>.

 

C# 3.0 addresses this issue with extension methods. Here’s what it would like like:

static IEnumerable<Employee>

Filter(this IEnumerable<Employee> employees, Choose choose) {

    foreach (Employee employee in employees) {

        if (choose(employee)) {

            yield return employee;

        }

    }

}

static IEnumerable<Employee>

GoldWatch(this IEnumerable<Employee> employees) {

    return employees.Filter(employee => employee.Years>3);

}

static IEnumerable<Employee>

SalesForce(this IEnumerable<Employee> employees) {

    return employees.Filter(

        employee => employee.Department=="Sales");

}

Notice the addition of the this keyword in front of the first parameter of each method. Now we can call these methods as if they were members of the IEnumerable<Employee> class.

employees.GoldWatch();

employees.SalesForce();

employees.Filter(employee => employee.Department=="Sales");

Notice this lets the caller compose more complex queries by chaining:

employees

    .GoldWatch()

    .SalesForce();

Here we get members of the sales force which should receive a gold watch.

 

This looks pretty good, but what happens when we have to start filtering our Customers and our Inventory lists the same way we’ve been filtering our Employee lists?

 

Instead of creating a new Filter implementation for every possible type, we can make the Filter method generic:

delegate bool Choose<T>(T t);

static IEnumerable<T>

Filter<T>(this IEnumerable<T> items, Choose<T> choose) {

    foreach (T item in items) {

        if (choose(item)) {

            yield return item;

        }

    }

}

Now we can filter any type of collection we like.

int [] a = new int [] {1,2,3,4,5};

a.Filter(i => i==1 || i==3);

 

The Filter method is so useful and so general purpose that there will be a built-in implementation of it called Where. Here’s the actual implementation of Where for the code we delivered at PDC:

public delegate T Func<A0, T>(A0 arg0);

public static

IEnumerable<T> Where<T>(this IEnumerable<T> source,

Func<T, bool> predicate) {

    foreach (T element in source) {

        if (predicate(element)) yield return element;

    }

}

If you have installed the ‘LINQ Preview’, you can see the Where method and a bunch of others by opening ‘C:\Program Files\LINQ Preview\docs\Sequence.cs’

 

Next, we'll take this query into SQL Server and see how DLinq helps unify the worlds of objects and relational data.

 

This posting is provided "AS IS" with no warranties, and confers no rights.

kick it on DotNetKicks.com