[Python-Dev] Iterators (PEP 234)

M.-A. Lemburg mal@lemburg.com
Tue, 06 Feb 2001 14:16:22 +0100


Ka-Ping Yee wrote:
> 
> On Tue, 6 Feb 2001, M.-A. Lemburg wrote:
> > > For the third time: write an example, please.  It will help a lot.
> >
> > Ping, what do you need an example for ? The above sentence says
> > it all:
> 
> *sigh*  I give up.  I'm not going to ask again.
> 
> Real examples are a good idea when considering any proposal.
> 
>     (a) When you do a real example, you usually discover
>         mistakes or things you didn't think of in your design.
> 
>     (b) We can compare it directly to other examples to see
>         how easy or hard it is to write and understand code
>         that uses the new protocol.
> 
>     (c) We can come up with interesting cases in practice to
>         see if there are limitations in any proposal.
> 
> Now that you have a proposal in slightly more detail, a few
> missing pieces are evident.
> 
> How would you implement a *Python* class that supports iteration?
> For instance, write something that has the effect of the FileLines
> class in PEP 234.

I was just throwing in ideas, not a complete proposal. If that's
what you want I can write up a complete proposal too and maybe
even a patch to go with it. Exposing the tp_nextitem slot in
Python classes via a __nextitem__ slot wouldn't be much of a 
problem.

What I wanted to get across is the general idea behind my
view of an iteration API and I believe that this idea has 
been made clear: I want a low-level API and move all the
complicated object specific details into separate iterator
objects. 

I don't see a point in trying to add complicated
machinery to Python just to be able to iterate fast over
some of the builtin types by special casing each object type.

Let's please not add more special cases to the core.
 
> How would you implement an object that can be iterated over more
> than once, at the same time or at different times?  It's not clear
> to me how the single tp_nextitem slot can handle that.

Put all that logic into the iterator objects. These can
be as complicated as needed, either trying to work in
generic ways, special cased for some builtin types or be
specific to a single type.
 
> > Since the for-loop can avoid creating temporary integers,
> > iterations will generally run a lot faster than before. Also,
> > iterators have access to the object's internal representation,
> > so data access is also faster.
> 
> Again, completely orthogonal to both proposals.  Regardless of
> the protocol, if you're implementing the iterator in C, you can
> use raw integers and internal access to make it fast.
> 
> > > 2.  IMHO
> > >
> > >     for key:value in dict:
> > >
> > > is much easier to read and explain than
> > >
> > >     for (key, value) in dict.xitems():
> [...]
> > Tuples are well-known basic Python types. Why should
> > (key,value) be any harder to understand than key:value.
> 
> It's mainly the business of calling the method and rearranging
> the data that i'm concerned about.
> 
> Example 1:
> 
>     dict = {1: 2, 3: 4}
>     for (key, value) in dict.items():
> 
> Explanation:
> 
>     The "items" method on the dict converts {1: 2, 3: 4} into
>     a list of 2-tuples, [(1, 2), (3, 4)].  Then (key, value) is
>     matched against each item of this list, and the two parts
>     of each tuple are unpacked.
> 
> Example 2:
> 
>     dict = {1: 2, 3: 4}
>     for key:value in dict:
> 
> Explanation:
> 
>     The "for" loop iterates over the key:value pairs in the
>     dictionary, which you can see are 1:2 and 3:4.

Again, if you prefer the key:value notation, fine. This is 
orthogonal to the iteration API though and really only touches 
the case of mappings.
 
> > Besides, the items() method has been around for ages, so switching
> > from .items() to .xitems() in programs will be just as easy as
> > switching from range() to xrange().
> 
> It's not the same.  xrange() is a built-in function that you call;
> xitems() is a method that you have to *implement*.

You can put all that special logic into special iterators,
e.g. a xitems iterator (see the end of my post).
 
> > >     for (key, value) in dict.xitems():
> > >
> > > then you are screwed if you try to replace dict with any kind of
> > > user-implemented dictionary-like replacement (since you'd have to
> > > go back and implement the xitems() method on everything).
> >
> > Why is that ? You'd just have to add .xitems() to UserDict
> 
> ...and cgi.FieldStorage, and dumbdbm._Database, and rfc822.Message,
> and shelve.Shelf, and bsddbmodule, and dbmmodule, and gdbmmodule,
> to name a few.  Even if you expect (or force) people to derive all
> their dictionary-like Python classes from UserDict (which they don't,
> in practice), you can't derive C objects from UserDict.

The same applies to your proposed interface: people will have
to write new code in order to be able to use the new technology.
I don't see that as a problem, though.
 
> > >     for (key, value) in dict.items():
> > >
> > > then now you are screwed if dict is a built-in dictionary, since
> > > items() is supposed to construct a list, not an iterator.
> >
> > I'm not breaking backward compatibility -- the above will still
> > work like it has before since lists don't have the tp_nextitem
> > slot.
> 
> What i mean is that Python programmers would no longer know how to
> write their 'for' loops.  Should they use 'xitems', thus dooming
> their loop never to work with the majority of user-implemented
> mapping-like objects?  Or should they use 'items', thus dooming
> their loop to run inefficiently on built-in dictionaries?

Hey, people who care will be aware of this difference. It is very
easy to test for interfaces in Python, so detecting the best
method (in case it matters) is simple.
 
> > > We want this feature to smoothly extend and work with existing objects
> > > with a minimum of rewriting, ideally none.  PEP 234 achieves this ideal.
> >
> > Again, you are trying to achieve forward compatibility. If people
> > want better performance, than they will have to add new functionality
> > to their types -- one way or another.
> 
> Okay, i agree, it's forward compatibility.  But it's something
> worth going for when you're trying to come up with a protocol.

Sure, but is adding special cases everywhere really worth it ?
>From the Python programmer perspective this discussion boils down
to (e.g. for mappings):

for key:value in mapping:
vs.
for key, value in mapping.xitems():

Programmers will already know and use the second variant, so
switching to it won't be a big deal.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/