[Python-Dev] Iterators (PEP 234)

Tue, 6 Feb 2001 03:04:34 -0800 (PST)

On 5 Feb 2001, M.-A. Lemburg wrote:
> > > The .iterator() method would have to return an object which
> > > provides an iterator API (at C level to get the best performance).
> > 
> > Okay, provide an example.  Write this iterator() method in Python.
> > Now answer: how does 'for' know whether the thing to the right of
> > 'in' is an iterator or a sequence?
> 
> Simple: have the for-loop test for a type slot and have
> it fallback to __getitem__ in case it doesn't find the slot API.

For the third time: write an example, please.  It will help a lot.

> Sorry, Ping, I didn't know you have a PEP for iterators already.

I posted it on this very boutique (i mean, mailing list) a week ago
and messages have been going back and forth on its thread since then.

On 31 Jan 2001, Ka-Ping Yee wrote:
| Okay, i have written a draft PEP that tries to combine the
| "elt in dict", custom iterator, and "for k:v" issues into a
| coherent proposal.  Have a look:
| 
|     http://www.lfw.org/python/pep-iterators.txt
|     http://www.lfw.org/python/pep-iterators.html

Okay.  I apologize for my impatient tone, as it comes from the
expectation that anyone would have read the document before trying
to discuss it.  I am very happy to get *new* information, the
discovery of new errors in my thinking, better and interesting
arguments; it's just that it's exasperating to see arguments
repeated that were already made, or objections raised that were
already carefully thought out and addressed.  From now on, i'll
stop resisting the urge to paste the text of proposals inline
(instead of politely posting just URLs) so you won't miss them.

> Done. Didn't know it exists, though (why isn't the PEP#
> in the subject line ?).

It didn't have a number at the time i posted it.  Thank you
for updating the subject line.

> Since the object can have multiple methods to construct
> iterators, all you need is *one* iterator API. You don't
> need a slot which returns an iterator object -- leave
> that decision to the programmer, e.g. you can have:
> 
> for key in dict.xkeys():
> for value in dict.xvalues():
> for items in dict.xitems():

Three points:

1.  We have syntactic support for mapping creation and lookup,
and syntactic support for mapping iteration should mirror it.

2.  IMHO

    for key:value in dict:

is much easier to read and explain than

    for (key, value) in dict.xitems():

(Greg?  Could you test this claim with a survey question?)

To the newcomer, the former is easy to understand at a surface
level.  The latter exposes the implementation (an implementation
that is still there in PEP 234, but that the programmer only has
to worry about if they are going deeper and writing custom
iteration behaviour).  This separates the work of learning into
two small, digestible pieces.

3. Furthermore, this still doesn't solve the backward-compatibility
problem that PEP 234 takes great care to address!  If you write
your for-loops

    for (key, value) in dict.xitems():

then you are screwed if you try to replace dict with any kind of
user-implemented dictionary-like replacement (since you'd have to
go back and implement the xitems() method on everything).

If, in order to maintain compatibility with the existing de-facto
dictionary interface, you write your for-loops

    for (key, value) in dict.items():

then now you are screwed if dict is a built-in dictionary, since
items() is supposed to construct a list, not an iterator.

> for entry in matrix.xrow(1):
> for entry in matrix.xcolumn(2):
> for entry in matrix.xdiag():

These are fine, since matrices are not core data types with
syntactic support or a de-facto emulation protocol.

> for i,element in sequence.xrange():

This is just as bad as the xitems() issue above -- probably worse --
since nobody implements xrange() on sequence-like objects, so now
you've broken compatibility with all of those.

We want this feature to smoothly extend and work with existing objects
with a minimum of rewriting, ideally none.  PEP 234 achieves this ideal.

> Since for-loops can check for the type slot, they can use an
> optimized implementation which avoids the creation of
> temporary integer objects and leave the state-keeping to the
> iterator which can usually provide a C based storage for it with
> much better performance.

This statement, i believe, is orthogonal to both proposals.

> Note that with this kind of interface, there is no need to
> add "Mapping Iterators" or "Sequence Iterators" as special
> cases, since these are easily implemented using the above
> iterators.

I think this really just comes down to one key difference
between our points of view here.  Correct me if you disagree:

    You seem to be suggesting that we should only consider a
    protocol for sequences, whereas PEP 234 talks about both
    sequences and mappings.

I argue that consideration for mappings is worthwhile because:

    1. Dictionaries are a built-in type with syntactic and
       core implementation support.

    2. Iteration over dictionaries is very common and should
       be spelled in an easily understood fashion.

    3. Both sequence and mapping protocols are formalized in
       the core (with PySequenceMethods and PyMappingMethods).

    4. Both sequence and mapping protocols are documented and
       used in Python (__getitem__, keys, values, etc.).

    5. There are many, many sequence-like and mapping-like
       objects out there, implemented both in Python and in C,
       which adhere to these protocols.

(There is also the not-insignificant side benefit of finally
having a decent way to get the indices while you're iterating
over a sequence, which i've wanted fairly often.)

-- ?!ng