Tweaking PEP-234 to improve Duck Typing

Sun Apr 6 13:59:17 EDT 2008

Id'a like to raise an issue that was partially discussed in
2006 ( 
http://groups.google.co.uk/group/comp.lang.python/browse_thread/thread/1811df36f2a131fd/435ba1cae670aecf?lnk=st&q=python+iterators+duck+typing#435ba1cae670aecf 
) with the half-promise that it would be revisited before Python 3000.
Now's the last chance.

What is Duck Typing?    Ultimately, the goal is that if you do something 
stupid, Python will give you a big fat error message
fairly soon after the stupid code was executed.  Without effective
duck typing, we'd be forced to put in lots of test code
everywhere, something like
assert isinstance(x, list)

Doing so would be bad because our python would become cluttered
and less able to be polymorphic/reused.   Nuff said.

Now, where does duck typing fail in modern Python?   In this case:

def foo(x):
	for i in x:
		doSomething(i)
	for i in x:
		somethingElse(i)

Function foo() is unsafe as part of any API because you never
know whether someone is going to pass it a list or an iterator.
For me, doing scientific programming, this is a *very* common
use case.   doSomething() may collect statistics or look for
bad data, then somethingElse() does the main computation.

Now, if foo() is somehow passed an iterator, the second loop
will fail silently, leading to much hair pulling and gnashing of
teeth.   Some might say "serves you right for making a mistake!",
but I've always suspected that such people go around insulting
victims of traffic accidents.

Of *course* there are ways to work around the problem.
Using Java is one, adding assert statements is another,
writing detailed docstrings is a third.  However, none are nearly
as good as duck typing.    Adding "x=list(x)" near the top of the
function should work, but at a horrible cost in efficiency
if it's a big list.

It seems that the 2006 discussion barely missed the right solution:

1) Create a new standard exception IteratorExhausted; it will
be a subclass of StopIteration.

2) StopIteration is raised when the iterator runs out of data.
If it.next() is called again, then IteratorExhausted should be raised.

3) For loops will be set to trap IteratorExhausted and raise
and error (perhaps raise a TypeError, "Iterator used in two for loops").

POSITIVE IMPACT:
This will reduce the transition difficulties to python 3.0
due to changes of zip() and other functions from lists to iterators.

Any code of the form

foo(zip(a,b))  or foo(map(...)) or foo(filter(...))

or a few other things would become silently wrong in python 3.0.   With 
this modification, it will be noisy wrong.   (Much better!)

Since IteratorExhausted is a subclass of StopIteration, normal uses of
StopIteration will be unaffected.     Code that sticks to the current
PEP-234 will continue to work absolutely unchanged.

NEGATIVE IMPACT:

Code in the form below will fail noisily if it was intended to be
used with current PEP-234 iterators and if the upper loop does not 
terminate early.   (But it will work correctly if handed a list.)

def bar(x):
	for i in x:
		if someThing(i):
			break
	for i in x:
		anotherThing(i)

However, note that this code will give different results  depending
if it is passed an iterator or a list, so it's somewhat dangerous
anyway.    I suspect this is a rare case compared to all the
python 3.0 upheaval. However, it can be fixed fairly easily and
efficiently by simply putting a try...except statement around the
second "for" loop.

I believe that it will add no silent failures to 2.5 code run on
Python3.0 and will convert many silent failures into noisy failures.
In my book, that's a Good Thing.  Overall, I believe it will reduce
the pain of Python 3.0 and increase the uptake rate.

Comments appreciated.  (Not that I could avoid them, anyway!)