Task vs IObservable: when to use what?

The Task<T> class was integrated with the .NET Framework in .NET 4.0, as part of what is known as the Task Parallel Library. The main purpose of the TPL was to facilitate writing parallel code, making it easier to run code on multiple cores, for instance. That is why the class itself was called Task, it represented a piece of program you could run in parallel with other pieces of program. Task, however, is much more versatile than that. In .NET 4.5 (now RTM, and on MSDN next week), Task will be the primary actor for working with the new async/await language feature.

What Task is in reality is a “future”. A future is an object that represents a value which may or may not exist yet. When you run a piece of code using Task.Run, what the Task returned represents is the return value of your piece of code. It will only be available when the piece of code finished running. However, Task is suited to represent any sort of operation that hasn’t completed yet, parallel computation but also IO. If parallel computing is important in some fields, asynchronous IO is capital for almost any application (Client and server).

Reactive Extensions (Rx) is a project that was started by Erik Meijer before the TPL even existed. It was only released recently though (2010). Rx is providing a new interface (IObservable<T>) that can be combined by the means of operators, the same way you can use LINQ on IEnumerable<T>.

Historically, the async/await feature did not exist yet when the Rx project started. As a result, they have now a great deal of overlap in what they can do. It doesn’t mean that one must disappear, there are situations where one or the other will be better suited. Let’s now see which one is better suited for what.

Reasons for using Task<T>

Let’s say you are turning a synchronous function returning int to async, you could do two things: return IObservable<int> or Task<int>.

Rx provides you with a bunch of operators, one of them is Observable.Return. This returns an observable that pushes a single value, then completes immediately. My opinion is that returning Task<int> is better, and here are the reasons why:

  1. Task is included in .NET, whereas the reactive extensions are not. Your code will require the reactive extensions if you chose to return IObservable.
  2. In C# 5.0, the async/await language feature can only return Task or Task<T> (or void), so you can use that feature and don’t need any conversion.
  3. The IObservable<T> contract is very broad, it just represents a stream of values. It could be zero value, one value, or multiple values. And after a value has been pushed, it could complete immediately, or later. On the other hand, a Task<T> represents always a single value of type T, and when the value is available, it becomes the result of the task, and the task is considered completed. At this point, the task is immutable, and you know it will never change state again. The contract is much simpler.
    If someone consuming your function looks at the signature, it is clear that it returns a single value if your return Task<T>. If you return IObservable<T> however, there are many things it could return.
  4. The threading model is more straightforward with Task<T>. When you are handed a task, and want to attach a continuation to it, you are in complete control of what thread it will run on, no matter where that task comes from. If you don’t specify any scheduler for your continuation, it will run on the current scheduler, or the default scheduler (Task pool) if there is no current scheduler. That behavior is always true, and completely independent from the task you are attaching a continuation on.
    In the reactive extensions, on the other hand, when you subscribe to an observable, the scheduler on which the callback will run depends on how the observable was constructed. This allows IObservable to have better performance, since it doesn’t switch thread at each step of the computation, but on the other hand, the abstraction leaks, and a piece of code working with an IObservable could behave differently depending on which scheduler that IObservable runs on.
  5. This is an extension of the previous point, but with the reactive extensions, depending on how the observable is constructed, your consuming code (the code in the callback) could be blocking the producing code.
    Task on the other hand, is just a piece of data. There is no producing code associated to it. This model is easier to reason with.

Reasons for using IObservable<T>

As you saw in the previous section, Task is a better abstraction and easier to reason with, but there are situations where Task<T> is not the best tool for the job. Task<T> represents a future for a single value of type T. Sometimes, you need to return a stream of values. Here is an example: let’s imagine you are writing an asynchronous function that reads a database table, and returns it. If you return task, you will have to return something like Task<List<Row>>, that means you will be buffering all the rows in memory, and complete the task when the whole table is buffered. You can probably see why this can become a scalability issue. A better option is to return IObservable<Row>. Each row, as it is read from the database, is pushed through the IObservable. The consumer can then process rows as they become available, and there is never more than one row in memory at the same time.

Note that if as a consumer, you still want to buffer all the rows, you can easily convert the IObservable<Row> to a Task<IList<Row>>, using this function:

 public static Task<IList<T>> BufferAllAsync<T>(this IObservable<T> observable)
    List<T> result = new List<T>();
    object gate = new object();
    TaskCompletionSource<IList<T>> finalTask = new TaskCompletionSource<IList<T>>();

        value =>
            lock (gate)
        exception => finalTask.TrySetException(exception),
        () => finalTask.SetResult(result.AsReadOnly())

    return finalTask.Task;

To wrap up, I would suggest to use Task whenever you can, and use IObservable if you get into a more complex situation where you need an entire collection to be turned asynchronous.