[Python-Dev] The iterator story

Ka-Ping Yee ping@zesty.ca
Sun, 21 Jul 2002 19:55:22 -0700 (PDT)


I'm in a bit of a bind.  I know at this point that Guido's already
made up his mind so there's nothing further to be gained by debating
the issue; yet i feel compelled to respond as long as people keep
missing the idea or saying things that don't make sense.

So: this is a clarification, not a push.

I am going to reply to a few messages at once, to reduce the number
of messages that i'm sending on this topic.  If you're planning to
reply on this thread, please read the whole message before replying.

                *               *               *

On Mon, 22 Jul 2002, Greg Ewing wrote:
> This shows up a problem with Ping's proposal, I think:
> The place where you write the for-loop isn't the place
> where you know whether something will be iterated over
> more than once or not.

When you write the for-loop, you decide whether you want
to consume the sequence.  You use the convention and expect
the implementor of the sequence object to adhere to it.

> How is a library routine going
> to know whether a sequence passed to it is going to
> be used again later?

You've got this backwards.  You write the library routine
the way that makes sense, and then you document whether
the sequence gets destroyed or not.  That declaration
becomes part of your interface, and users of your routine
can then determine how to use it safely for their needs.

(Analogy: how does the implementor of file.close() know
whether the caller wants to use the file again later?
Answer: it's not the implementor's job to know that.  We
document what file.close() does, and people only *decide*
to call file.close() when they don't need the file anymore.)

Without a convention to distinguish between destruction
and non-destruction, you can't establish what the library
routine does; so you can't document it; so you can't use
it safely *even* if you trust the implementor.  No
implementation would ever make it possible for your library
routine to claim that it "does <blah> with the elements of
a given sequence without destroying the sequence".

Now if you do have a convention -- yes, you still have to
trust implementors to follow the convention -- but if they
do so, you're okay.

                *               *               *

> This appears to leave the library writer with two
> choices: (1) Use for-in, to be on the safe side,
> in case the user doesn't want the sequence destroyed --
> but then it can't be used on a destructive iterator,

No, it can.  The documentation for the library routine
will state that it wants a sequence.  If the caller wants to
use x and x is an iterator, it passes in seq(x).  No problem.
The caller has thereby declared that it's okay to destroy x.

To make it more obvious what is going on, i should have chosen
a better name; 'seq' was poor.  Let's rename 'seq' to 'consume'.

    consume(i) returns an object x such that iter(x) is i.

So calling 'consume' implies that you are consuming an iterator.
All right.  Then consider:

    for x in consume(y):
        print x

The above is clear that y is being destroyed.  Now consider:

    def printout(sequence):
        for x in sequence:
            print x

If y is an iterator, in my world you would not be able to
call "printout(y)".  You would say "printout(consume(y))",
thus making it clear that y is being destroyed.

> (2) use for-from, and force everyone who calls it to
> adapt sequences to iterators before calling.

Since for-in is non-destructive, it is safer, and it is also
more common to have a sequence than an iterator.  So i would
usually choose option 1 rather than 2.

But sure, you can write for-from, if you want.  I mean, if you
decide to accept strings, then users who want to pass in integers
will have to str() them first.  If you decide to accept integers,
then users who want to pass in strings will have to int() them
first.  This is no great dilemma.  We actually like this.

                *               *               *

Hereafter i'll stick to existing syntax, because the business of
introducing syntax isn't really the main point.  I'll use the
alternative i proposed, which is to use the built-in instead.
So we'd say

    for i in consume(it): ...

instead of

    for i from it: ...

Tim Delaney wrote:
> I think this is the crux of the matter. You see for: loops as inherently
> non-destructive - that they operate on containers. I (and presumably
> Guido, though I would never presume to channel him ;) see for: loops as
> inherently destructive - that they operate on iterators. That they obtain
> an iterator from a container (if possible) is a useful convenience.

I believe your interpretation of opinions is correct on all counts.
Except i would point out that for-loops are not always destructive;
most of the time, they are not, and that is why i consider the
destructive behaviour surprising and worth making visible.

> Perhaps the terminology is confusing. Consider a queue.
>
> for each person in the queue:
>     service the person
>
> Is there anyone who would *not* consider this to be destructive (of the
> queue)?

Well, the only reason you can tell is that you can see the context
from the meanings of the words "queue" and "service".  If you said

    for person in consume(queue):
        service(person)

then that would truly be clear, even if you used different variable
names, because the 'consume' built-in expresses that the queue will
be consumed.

                *               *               *

Greg Ewing wrote:
> Given suitable values for x and y, it's possible for evaluating "x+y"
> to be a destructive operation.  Does that mean we should revise the
> "+" protocol somehow to prevent this from happening? I don't think so.

Augh!  I'm just not getting through here.

We all know that the Python philosophy is to trust the implementors of
protocols instead of enforcing behaviour.  That's not the point.

Of course it's POSSIBLE for "x + y" to be destructive.  That doesn't
mean it SHOULD be.  We all know that "x + y" is normally not
destructive, and that's what counts.  That understanding enables me to
implement __add__ in a way that will not screw you over when you use it.

All i'm saying is that there should be a way to *express* safe iteration
(and safe "element in container" tests).

Guido's pronouncement is "Nope.  Don't need it."

Although i disagree, i am willing to respect that.  But please don't
confuse a lack of enforcement with a lack of convention.  Convention
is all we have.


-- ?!ng