October 2017

Volume 32 Number 10

[C++]

From Algorithms to Coroutines in C++

By Kenny Kerr

There’s a C++ Standard Library algorithm called iota that has always intrigued me. It has a curious name and an interesting function. The word iota is the name of a letter in the Greek alphabet. It’s commonly used in English to mean a very small amount and often the negative, not the least amount, derived from a quote in the New Testament Book of Matthew. This idea of a very small amount speaks to the function of the iota algorithm. It’s meant to fill a range with values that increase by a small amount, as the initial value is stored and then incremented until the range has been filled. Something like this:

#include <numeric>

int main()
{
  int range[10];
  // Range: Random missile launch codes

  std::iota(std::begin(range), std::end(range), 0);
  // Range: { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
}

It’s often said that C++ developers should expunge all naked for loops and replace them with algorithms. Certainly, the iota algorithm qualifies as it takes the place of the for loop that any C or C++ developer has undoubtedly written thousands of times. You can imagine what your C++ Standard Library implementation might look like:

namespace std
{
  template <typename Iterator, typename Type>
  void iota(Iterator first, Iterator last, Type value)
  {
    for (; first != last; ++first, ++value)
    {
      *first = value;
    }
  }
}

So, yeah, you don’t want to be caught in a code review with code like that. Unless you’re a library developer, of course. It’s great that the iota algorithm saves me from having to write that for loop, but you know what? I’ve never actually used it in production. The story usually goes something like this: I need a range of values. This is such a fundamental thing in computer science that there must be a standard algorithm for it. I again scour the list over at bit.ly/2i5WZRc and I find iota. Hmm, it needs a range to fill with values. OK, what’s the cheapest range I can find … I then print the values out to make sure I got it right using … a for loop:

#include <numeric>
#include <stdio.h>

int main()
{
  int range[10];
  std::iota(std::begin(range), std::end(range), 0);

  for (int i : range)
  {
    printf("%d\n", i);
  }
}

To be honest, the only thing I like about this code is the range-based for loop. The problem is that I simply don’t need nor want that range. I don’t want to have to create some container just to hold the values so that I can iterate over them. What if I need a lot more values? I’d much rather just write the for loop myself:

#include <stdio.h>

int main()
{
  for (int i = 0; i != 10; ++i)
  {
    printf("%d\n", i);
  }
}

To add insult to injury, this involves a lot less typing. It sure would be nice, however, if there were an iota-like function that could somehow generate a range of values for a range-based for loop to consume without having to use a container. I was recently browsing a book about the Python language and noticed that it has a built-in function called range. I can write the same program in Python like this:

for i in range(0, 10):
  print(i)

Be careful with that indentation. It’s how the Python language represents compound statements. I read that Python was named after a certain British comedy rather than the nonvenomous snake. I don’t think the author was kidding. Still, I love the succinct nature of this code. Surely, I can achieve something along these lines in C++. Indeed, this is what I wish the iota algorithm would provide but, alas. Essentially, what I’m looking for is a range algorithm that looks something like this:

template <typename T>
generator<T> range(T first, T last)
{
  return{ ... };
}

int main()
{
  for (int i : range(0, 10))
  {
    printf("%d\n", i);
  }
}

To my knowledge, no such function exists, so let's go and build it. The first step is to approximate the algorithm with something reliable that can act as a baseline for testing. The C++ standard vector container comes in handy in such cases:

#include <vector>

template <typename T>
std::vector<T> range(T first, T last)
{
  std::vector<T> values;

  while (first != last)
  {
    values.push_back(first++);
  }

  return values;
}

It also does a good job of illustrating why you don't want to build a container in the first place, or even figure out how large it should be, for that matter. Why should there even be a cap? Still, this is useful because you can easily compare the output of this range generator to a more efficient alternative. Well, it turns out that writing a more efficient generator isn’t that difficult. Have a look at Figure 1.

Figure 1 A Classical Generator

template <typename T>
struct generator
{
  T first;
  T last;

  struct iterator{ ... };

  iterator begin()
  {
    return{ first };
  }

  iterator end()
  {
    return{ last };
  }
};

template <typename T>
generator<T> range(T first, T last)
{
  return{ first, last };
}

The range function simply creates a generator initialized with the same pair of bounding values. The generator can then use those values to produce lightweight iterators via the conventional begin and end member functions. The most tedious part is spitting out the largely boilerplate iterator implementation. The iterator can simply hold a given value and increment it as needed. It must also provide a set of type aliases to describe itself to standard algorithms. This isn't strictly necessary for the simple range-based for loop, but it pays to include this as a bit of future-proofing:

template <typename T>
struct generator
{
  struct iterator
  {
    T value;

    using iterator_category = std::input_iterator_tag;
    using value_type = T;
    using difference_type = ptrdiff_t;
    using pointer = T const*;
    using reference = T const&;

Incrementing the iterator can simply increment the underlying value. The post-increment form can safely be deleted:

iterator& operator++()
{
  ++value;
  return *this;
}

iterator operator++(int) = delete;

The other equally important function provided by an iterator is that of comparison. A range-based for loop will use this to determine whether it has reached the end of the range:

bool operator==(iterator const& other) const
{
  return value == other.value;
}

bool operator!=(iterator const& other) const
{
  return !(*this == other);
}

Finally, a range-based for loop will want to dereference the iterator to return the current value in the range. I could delete the member call operator, because it isn’t needed for the range-based for loop, but that would needlessly limit the utility of generators to be used by other algorithms:

T const& operator*() const
{
  return value;
}

T const* operator->() const
{
  return std::addressof(value);
}

It might be that the generator and associated range function are used with number-like objects rather than simple primitives. In that case, you might also want to use the address of helper, should the number-like object be playing tricks with operator& overloading. And that’s all it takes. My range function now works as expected:

template <typename T>
generator<T> range(T first, T last)
{
  return{ first, last };
}

int main()
{
  for (int i : range(0, 10))
  {
    printf("%d\n", i);
  }
}

Of course, this isn’t particularly flexible. I’ve produced the iota of my dreams, but it’s still just an iota of what would be possible if I switched gears and embraced coroutines. You see, with coroutines you can write all kinds of generators far more succinctly and without having to write a new generator class template for each kind of range you’d like to produce. Imagine if you only had to write one more generator and then have an assortment of range-like functions to produce different sequences on demand. That’s what coroutines enable. Instead of embedding the knowledge of the original iota generation into the generator, you can embed that knowledge directly inside the range function and have a single generator class template that provides the glue between producer and consumer. Let’s do it.

I begin by including the coroutine header, which provides the definition of the coroutine_handle class template:

#include <experimental/coroutine>

I’ll use the coroutine_handle to allow the generator to interact with the state machine represented by a coroutine. This will query and resume as needed to allow a range-based for loop—or any other loop, for that matter—to direct the progress of the coroutine producing a pull- rather than push-model of data consumption. The generator is in some ways similar to that of the classical generator in Figure 1. The big difference is that rather than updating values directly, it merely nudges the coroutine forward. Figure 2 provides the outline.

Figure 2 A Coroutine Generator

template <typename T>
struct generator
{
  struct promise_type{ ... };

  using handle_type = std::experimental::coroutine_handle<promise_type>;

  handle_type handle{ nullptr };

  struct iterator{ ... };

  iterator begin()
  {
    ...
    handle.resume();
    ...
  }

  iterator end()
  {
    return nullptr;
  }
};

So, there's a little more going on here. Not only is there an iterator that allows the range-based for loop to interact with the generator from the outside, but there's also a promise_type that allows the coroutine to interact with the generator from the inside. First, some housekeeping: Recall that the function generating values won't be returning a generator directly, but rather allow a developer to use co_yield statements to forward values from the coroutine, through the generator, and to the call site. Consider the simplest of generators:

generator<int> one_two_three()
{
  co_yield 1;
  co_yield 2;
  co_yield 3;
}

Notice how the developer never explicitly creates the coroutine return type. That’s the role of the C++ compiler as it stitches together the state machine represented by this code. Essentially, the C++ compiler looks for the promise_type and uses that to construct a logical coroutine frame. Don’t worry, the coroutine frame will likely disappear after the C++ compiler is done optimizing the code in some cases. Anyway, the promise_type is then used to initialize the generator that gets returned to the caller. Given the promise_type, I can get the handle representing the coroutine so that the generator can drive it from the outside in:

generator(promise_type& promise) :
  handle(handle_type::from_promise(promise))
{
}

Of course, the coroutine_handle is a pretty low-level construct and I don’t want a developer holding onto a generator to accidentally corrupt the state machine inside of an active coroutine. The solution is simply to implement move semantics and prohibit copies. Something like this (first, I’ll give it a default constructor and expressly delete the special copy members):

generator() = default;
generator(generator const&) = delete;
generator &operator=(generator const&) = delete;

And then I’ll implement move semantics simply by transferring the coroutine’s handle value so that two generators never point to the same running coroutine, as shown in Figure 3.

Figure 3 Implementing Move Semantics

generator(generator&& other) : handle(other.handle)
{
  other.handle = nullptr;
}

generator &operator=(generator&& other)
{
  if (this != &other)
  {
    handle = other.handle;
    other.handle = nullptr;
  }

  return *this;
}

Now, given the fact that the coroutine is being driven from the outside, it's important to remember that the generator also has the responsibility of destroying the coroutine:

~generator()
{
  if (handle)
  {
    handle.destroy();
  }
}

This actually has more to do with the result of final_suspend on the promise_type, but I’ll save that for another day. That’s enough bookkeeping for now. Let’s now look at the generator’s promise_type. The promise_type is a convenient place to park any state such that it will be included in any allocation made for the coroutine frame by the C++ compiler. The generator is then just a lightweight object that can easily move around and refer back to that state as needed. There are only two pieces of information that I really need to convey from within the coroutine back out to the caller. The first is the value to yield and the second is any exception that might have been thrown:

#include <variant>

template <typename T>
struct generator
{
  struct promise_type
  {
    std::variant<T const*, std::exception_ptr> value;

Although optional, I tend to wrap exception_ptr objects inside std::optional because the implementation of exception_ptr in Visual C++ is a little expensive. Even an empty exception_ptr calls into the CRT during both construction and destruction. Wrapping it inside optional neatly avoids that overhead. A more precise state model is to use a variant, as I just illustrated, to hold either the current value or the exception_ptr because they’re mutually exclusive. The current value is merely a pointer to the value being yielded inside the coroutine. This is safe to do because the coroutine will be suspended while the value is read and whatever temporary object may be yielded up will be safely preserved while the value is being observed outside of the coroutine.

When a coroutine initially returns to its caller, it asks the promise_type to produce the return value. Because the generator can be constructed by giving a reference to the promise_type, I can simply return that reference here:

promise_type& get_return_object()
{
  return *this;
}

A coroutine producing a generator isn’t your typical concurrency-­enabling coroutine and it’s often just the generator that dictates the lifetime and execution of the coroutine. As such, I indicate to the C++ compiler that the coroutine must be initially suspended so that the generator can control stepping through the coroutine, so to speak:

std::experimental::suspend_always initial_suspend()
{
  return {};
}

Likewise, I indicate that the coroutine will be suspended upon return, rather than having the coroutine destroy itself automatically:

std::experimental::suspend_always final_suspend()
{
  return {};
}

This ensures that I can still query the state of the coroutine, via the promise_type allocated within the coroutine frame, after the coroutine completes. This is essential to allow me to read the exception_ptr upon failure, or even just to know that the coroutine is done. If the coroutine automatically destroys itself when it completes, I wouldn’t even be able to query the coroutine_handle, let alone the promise_type, following a call to resume the coroutine at its final suspension point. Capturing the value to yield is now quite straight forward:

std::experimental::suspend_always yield_value(T const& other)
{
  value = std::addressof(other);
  return {};
}

I simply use the handy address of helper again. A promise_type must also provide a return_void or return_value function. Even though it isn’t used in this example, it hints at the fact that co_yield is really just an abstraction over co_await:

void return_void()
{
}

More on that later. Next, I’ll add a little defense against misuse just to make it easier for the developer to figure out what went wrong. You see, a generator yielding values implies that unless the coroutine completes, a value is available to be read. If a coroutine were to include a co_await expression, then it could conceivably suspend without a value being present and there would be no way to convey this fact to the caller. For that reason, I prevent a developer from writing a co_await statement, as follows:

template <typename Expression>
Expression&& await_transform(Expression&& expression)
{
  static_assert(sizeof(expression) == 0, 
    "co_await is not supported in coroutines of type generator");
  return std::forward<Expression>(expression);
}

Wrapping up the promise_type, I just need to take care of catching, so to speak, any exception that might have been thrown. The C++ compiler will ensure that the promise_type’s unhandled_exception member is called:

void unhandled_exception()
{
  value = std::current_exception();
}

I can then, just as a convenience to the implementation, provide a handy function for optionally rethrowing the exception in the appropriate context:

void rethrow_if_failed()
{
  if (value.index() == 1)
  {
    std::rethrow_exception(std::get<1>(value));
  }
}

Enough about the promise_type. I now have a functioning generator—but I’ll just add a simple iterator so that I can easily drive it from a range-based for loop. As before, the iterator will have the boilerplate type aliases to describe itself to standard algorithms. However, the iterator simply holds on to the coroutine_handle:

struct iterator
{
  using iterator_category = std::input_iterator_tag;
  using value_type = T;
  using difference_type = ptrdiff_t;
  using pointer = T const*;
  using reference = T const&;

  handle_type handle;

Incrementing the iterator is a little more involved than the simpler iota iterator as this is the primary point at which the generator interacts with the coroutine. Incrementing the iterator implies that the iterator is valid and may in fact be incremented. Because the “end” iterator holds a nullptr handle, I can simply provide an iterator comparison, as follows:

bool operator==(iterator const& other) const
{
  return handle == other.handle;
}

bool operator!=(iterator const& other) const
{
  return !(*this == other);
}

Assuming it’s a valid iterator, I first resume the coroutine, allowing it to execute and yield up its next value. I then need to check whether this execution brought the coroutine to an end, and if so, propagate any exception that might have been raised inside the coroutine:

iterator &operator++()
{
  handle.resume();

  if (handle.done())
  {
    promise_type& promise = handle.promise();
    handle = nullptr;
    promise.rethrow_if_failed();
  }

  return *this;
}

iterator operator++(int) = delete;

Otherwise, the iterator is considered to have reached its end and its handle is simply cleared such that it will compare successfully against the end iterator. Care needs to be taken to clear the coroutine handle prior to throwing any uncaught exception to prevent anyone from accidentally resuming the coroutine at the final suspension point, as this would lead to undefined behavior. The generator’s begin member function performs much the same logic, to ensure that I can consistently propagate any exception that’s thrown prior to reaching the first suspension point:

iterator begin()
{
  if (!handle)
  {
    return nullptr;
  }

  handle.resume();

  if (handle.done())
  {
    handle.promise().rethrow_if_failed();
    return nullptr;
  }

  return handle;
}

The main difference is that begin is a member of the generator, which owns the coroutine handle, and therefore must not clear the coroutine handle. Finally, and quite simply, I can implement iterator dereferencing simply by returning a reference to the current value stored within the promise_type:

T const& operator*() const
{
  return *std::get<0>(handle.promise().value);
}

T const* operator->() const
{
  return std::addressof(operator*());
}

And I’m done. I can now write all manner of algorithms, producing a variety of generated sequences using this generalized generator. Figure 4 shows what the inspirational range generator looks like.

Figure 4 The Inspirational Range Generator

template <typename T>
generator<int> range(T first, T last)
{
  while (first != last)
  {
    co_yield first++;
  }
}

int main()
{
  for (int i : range(0, 10))
  {
    printf("%d\n", i);
  }
}

Who needs a limited range, anyway? As I now have a pull model, I can simply have the caller decide when they've had enough, as you can see in Figure 5.

Figure 5 A Limitless Generator

template <typename T>
generator<int> range(T first)
{
  while (true)
  {
    co_yield first++;
  }
}

int main()
{
  for (int i : range(0))
  {
    printf("%d\n", i);

    if (...)
    {
      break;
    }
  }
}

The possibilities are endless! There is, of course, more to generators and coroutines and I’ve only just scratched the surface here. Join me next time for more on coroutines in C++. You can find the complete example from this article over on Compiler Explorer: godbolt.org/g/NXHBZR.


Kenny Kerr is an author, systems programmer, and the creator of C++/WinRT. He is also an engineer on the Windows team at Microsoft where he is designing the future of C++ for Windows, enabling developers to write beautiful high-­performance apps and components with incredible ease.

Thanks to the following technical expert for reviewing this article: Gor Nishanov


Discuss this article in the MSDN Magazine forum