[Python-Dev] The iterator story

Greg Ewing greg@cosc.canterbury.ac.nz
Mon, 22 Jul 2002 16:50:02 +1200 (NZST)


Ka-Ping Yee <ping@zesty.ca>:

> When you write the for-loop, you decide whether you want
> to consume the sequence.

As someone pointed out, it's pretty rare that you actually *want* to
consume the sequence. Usually the choice is between "I don't care" and
"The sequence must NOT be consumed".

Of the two varieties of for-loop in your proposal, for-in
obviously corresponds to the "must not be consumed" case,
leading one to suppose that you intend for-from to be used in
the don't-care case. 

But now you seem to be suggesting that library routines
should always use for-in, and that the caller should
convert an iterator to a sequence if he knows it's okay
to consume it:

> Since for-in is non-destructive, it is safer, and it is also
> more common to have a sequence than an iterator.
> ...
> If y is an iterator, in my world you would not be able to
> call "printout(y)".  You would say "printout(consume(y))

Okay, that seems reasonable -- explicit is better than
implicit. But... consider the following two library
routines:

  def printout1(s):
    for x in s:
      print x

  def printout2(s):
    for x in s:
      for y in s:
        print x, y

Clearly it's okay to call printout1(consume(s)), but it's
NOT okay to call printout2(consume(s)). So we need to document
these requirements:

  def printout1(s):
    "s may be an iterator or sequence"
    for x in s:
      print x

  def printout2(s):
    "s MUST be a sequence, NOT an iterator!"
    for x in s:
      for y in s:
        print x, y

But now there's nothing to enforce these requirements -- no
exception will be raised if you call printout2(consume(s))
by mistake.

To get any safety benefit from your proposed arrangement,
it seems to me that you'd need to write printout1 as

  def printout1(s):
    "s must be an iterator"
    for x from s:
      print x

and then in the (overwhelmingly most common) case of passing it a
sequence, you would need to call it as printout1(iter(s)) -- unless
you allow the for-from protocol to automatically obtain an iterator
from a sequence if possible, the way for-in currently does.

> Greg Ewing wrote:
> > Given suitable values for x and y, it's possible for evaluating "x+y"
> > to be a destructive operation.  Does that mean we should revise the
> > "+" protocol somehow to prevent this from happening? I don't think so.
> 
> Augh!  I'm just not getting through here.

Sorry, I wrote that before I saw your full proposal. I
understand your point of view much better now, and
even sympathise with it to some extent -- something
like the for-from syntax actually passed through my
mind shortly before I saw it in your post. 

There's no doubt that it's very elegant theoretically,
but in thinking through the implications, I'm not sure it
would be all that helpful in practice, and might even
turn out to be a nuisance if it requires putting in a
lot of iter(x) and/or consume(x) calls.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+