More thoughts on Iterators

So I had a short email exchange with a collegue about iterators.  He mentioned that I should put in in my blog.  Seemed reasonable, so here it is:


Iterator generated code

I'm looking at porting a hand-coded enumerator to use iterators (this is for a sample and maybe an article I have in mind to write about moving a production-quality collection to Whidbey)

I'm decompiling the generated code with the latest .net reflector, as my IL isn't really up to much.

My hand-written code tries hard to implement the full contract from the MSDN docs for IEnumerator, i.e. stuff like throwing InvalidOperationException if the underlying collection changes in both Reset and MoveNext and throwing in Current if it is past the last element.

I was wondering a couple of things 

a) Will the past last element code get added to Current (i.e. state == -1)

b) Will a Reset implementation be added?
I can imagine that in many cases, the conditionals required to get to the yield return would be sufficient as a test in MoveNext and Reset, but I'm sure there are cases when it isn't.

However I guess this would require that any initialiser code for user state would have to move into the constructor for this to work, as docs for Reset specify that it throws if the collection has changed between creation and Reset

Subject: RE: Iterator generated code

a) No.  The language designers specifically wanted the Current property to be inlineable, and NEVER to throw, even if it is accessed in an invalid state (before calling MoveNext, after MoveNext has returned false, when the collection has changed, etc.).  The general reason is that most people use foreach (or some equivalent) and there should be no reason to punish them performance-wise fo rhte few people who try to call Current some other way.

b) Nope.  Again Anders mainly thought that enumerables are meant to be throw-away object and should never be reset.  He even pushed the generic form of IEnumerator to not have a Reset method (I think he would have removed it from the non-generic IEnumerator if it was possible).

I've often wished we had some nice syntax to say "run this code xyz for each iteration".  The classic example is checking so see if the underlying collection has changed.  Without such syntax the user has to write boiler plate code (hopefully just calling a single method) before or after every yield.


Subject: RE: Iterator generated code

Would be good to position iterators as a performance-optimized foreach-server rather than a super-strict implementation of the IEnumerable contract.

This got me thinking some more about iterators.  The frameworks often use an int as a version field to detect when a collection has changed.  This seems like a bit of overkill and isn't 100% correct because it does suffer from wrap-around (although that is almost entirely impossible short of malicious code).  So I was wondering how you could write 100% correct code and still use an iterator, and have it be clean and maintainable code.

My first thought was that you shouldn't need to increment the version all the time, only when there are live enumerator objects.  My next thought was that to have maintainable code the version checking and throwing should be encapsulated into a single method.  In C/C++ I would have used a macro to wrap up the version checking with the yield statement, but in C# I'm forced to just remeber to call my checker every place I yield.

Well, for the method that checks the version and throws, I have 2 choices: on the outer collection class, or as an anonymous method inside the iterator.  Somehow the anonymous methods seems cleaner (and cooler) because it fits with the notion that an iterator effectively creates an anonymous class.

Now back to detecting changes.  Instead of having an int, why not have an event that fires when the collection changes.  The iterator could subscribe to it with an anonymous method as well.  Then when there are no active iterators, there is little overhead (which is faster: a null check or an interlocked increment?).

So here's some quick slapped together pseudo-code:

public class SomeCollection {    private event EventHandler CollectionChanged;    private void OnCollectionChanged() {        EventHandler h = CollectionChanged;        if (h != null)            h(this, EventArgs.Empty);    }    delegate void SimpleDelegate();    public IEnumerable<SomeType> MyIterator() {        bool changed = false;        EventHandler changedHandler = null;        SimpleDelegate preCondition = delegate {            if (changed)                throw new InvalidOperationException(...);            // Possibly some other validation code here???        };        changedHandler = delegate {            changed = true;            // Once it's changed, we don't need to know about future changes            CollectionChanged -= changedHandler;            changed Handler = null;        };        try {            CollectionChanged += changedHandler;            ...            yield return someValue;            preCondition();            ...            yield return someValue;            preCondition();            ...        }        finally {            if (changedHandler != null) {                CollectionChanged -= changedHandler;                changedHandler = null;            }        }    }}

So what do you think?  It's still not perfect because the event isn't subscribed to until the first call into MoveNext, which may happen significantly after the call to GetEnumerator.  The only way around that would be to wrap the iterator itself with another method that created the changedHandler, subscribed it and then passed in the changedHandler/preCondition pair to the real iterator method.  The down-side to that would be that if somebody called GetEnumerator twice on the same IEnumerable, both IEnumerators would share the same changedHandler/preCondition.

It still does seem like an awful lot of boiler-plate code.  I might keep the notion of using an anonymous method as a way to pre and post conditions in an iterator, but using the int as a version for collections does seem to have fewer draw-backs (and a lot less boiler-plate code)...