[Python-Dev] Iterators (PEP 234)

Ka-Ping Yee ping@lfw.org
Tue, 6 Feb 2001 04:25:58 -0800 (PST)


On Tue, 6 Feb 2001, M.-A. Lemburg wrote:
> > For the third time: write an example, please.  It will help a lot.
> 
> Ping, what do you need an example for ? The above sentence says
> it all:

*sigh*  I give up.  I'm not going to ask again.

Real examples are a good idea when considering any proposal.

    (a) When you do a real example, you usually discover
        mistakes or things you didn't think of in your design.

    (b) We can compare it directly to other examples to see
        how easy or hard it is to write and understand code
        that uses the new protocol.

    (c) We can come up with interesting cases in practice to
        see if there are limitations in any proposal.

Now that you have a proposal in slightly more detail, a few
missing pieces are evident.

How would you implement a *Python* class that supports iteration?
For instance, write something that has the effect of the FileLines
class in PEP 234.

How would you implement an object that can be iterated over more
than once, at the same time or at different times?  It's not clear
to me how the single tp_nextitem slot can handle that.

> Since the for-loop can avoid creating temporary integers,
> iterations will generally run a lot faster than before. Also,
> iterators have access to the object's internal representation,
> so data access is also faster.

Again, completely orthogonal to both proposals.  Regardless of
the protocol, if you're implementing the iterator in C, you can
use raw integers and internal access to make it fast.

> > 2.  IMHO
> > 
> >     for key:value in dict:
> > 
> > is much easier to read and explain than
> > 
> >     for (key, value) in dict.xitems():
[...]
> Tuples are well-known basic Python types. Why should 
> (key,value) be any harder to understand than key:value.

It's mainly the business of calling the method and rearranging
the data that i'm concerned about.

Example 1:

    dict = {1: 2, 3: 4}
    for (key, value) in dict.items():

Explanation:

    The "items" method on the dict converts {1: 2, 3: 4} into
    a list of 2-tuples, [(1, 2), (3, 4)].  Then (key, value) is
    matched against each item of this list, and the two parts
    of each tuple are unpacked.

Example 2:

    dict = {1: 2, 3: 4}
    for key:value in dict:

Explanation:

    The "for" loop iterates over the key:value pairs in the
    dictionary, which you can see are 1:2 and 3:4.

> What would you tell a newbie that writes:
> 
> for key:value in sequence:
>     ....
> 
> where sequence is a list of tuples and finds that this doesn't
> work ?

"key:value doesn't look like a tuple, does it?"

> Besides, the items() method has been around for ages, so switching
> from .items() to .xitems() in programs will be just as easy as
> switching from range() to xrange().

It's not the same.  xrange() is a built-in function that you call;
xitems() is a method that you have to *implement*.

> >     for (key, value) in dict.xitems():
> > 
> > then you are screwed if you try to replace dict with any kind of
> > user-implemented dictionary-like replacement (since you'd have to
> > go back and implement the xitems() method on everything).
> 
> Why is that ? You'd just have to add .xitems() to UserDict

...and cgi.FieldStorage, and dumbdbm._Database, and rfc822.Message,
and shelve.Shelf, and bsddbmodule, and dbmmodule, and gdbmmodule,
to name a few.  Even if you expect (or force) people to derive all
their dictionary-like Python classes from UserDict (which they don't,
in practice), you can't derive C objects from UserDict.

> >     for (key, value) in dict.items():
> > 
> > then now you are screwed if dict is a built-in dictionary, since
> > items() is supposed to construct a list, not an iterator.
> 
> I'm not breaking backward compatibility -- the above will still
> work like it has before since lists don't have the tp_nextitem
> slot.

What i mean is that Python programmers would no longer know how to
write their 'for' loops.  Should they use 'xitems', thus dooming
their loop never to work with the majority of user-implemented
mapping-like objects?  Or should they use 'items', thus dooming
their loop to run inefficiently on built-in dictionaries?

> > We want this feature to smoothly extend and work with existing objects
> > with a minimum of rewriting, ideally none.  PEP 234 achieves this ideal.
> 
> Again, you are trying to achieve forward compatibility. If people
> want better performance, than they will have to add new functionality
> to their types -- one way or another.

Okay, i agree, it's forward compatibility.  But it's something
worth going for when you're trying to come up with a protocol.


-- ?!ng

"There's no point in being grown up if you can't be childish sometimes."
    -- Dr. Who