[Python-Dev] Iterators (PEP 234) (re: Sets BOF / for in dict)

M.-A. Lemburg mal@lemburg.com
Mon, 05 Feb 2001 23:36:55 +0100


Ka-Ping Yee wrote:
> 
> On Mon, 5 Feb 2001, M.-A. Lemburg wrote:
> > Slices and dictionary enclose the two parts in brackets -- this
> > places the colon into a visible context. for ... in ... : does
> > not provide much of a context.
> 
> For crying out loud!  '\':' requires that you tokenize the string
> before you know that the colon is part of the string.  Triple-quotes
> force you to tokenize carefully too.  There is *nothing* that this
> stay-away-from-colons argument buys you.
> 
> For a human skimming over source code -- i repeat, the visual hint
> is "colon at the END of a line".

Oh well, perhaps you are right and we should call things like
key:value association and be done with it.
 
> > > Because there's no good answer for "what does iterator() return?"
> > > in this design.  (Trust me; i did think this through carefully.)
> > > Try it.  How would you implement the iterator() method?
> >
> > The .iterator() method would have to return an object which
> > provides an iterator API (at C level to get the best performance).
> 
> Okay, provide an example.  Write this iterator() method in Python.
> Now answer: how does 'for' know whether the thing to the right of
> 'in' is an iterator or a sequence?

Simple: have the for-loop test for a type slot and have
it fallback to __getitem__ in case it doesn't find the slot API.

> > For dictionaries, this object could carry the needed state
> > (current position in the dictionary table) and use the PyDict_Next()
> > for the internals. Matrices would have to carry along more state
> > (one integer per dimension) and could access the internal
> > matrix representation directly using C functions.
> 
> This is already exactly what the PEP proposes for the implementation
> of sq_iter.

Sorry, Ping, I didn't know you have a PEP for iterators already.

...reading it...

> > This would give us: speed, flexibility and extensibility
> > which the syntax hacks cannot provide;
> 
> The PEP is not just about syntax hacks.  It's an iterator protocol.
> It's clear that you haven't read it.
> 
> *PLEASE* read the PEP before continuing to discuss it.  I quote:
> 
> | Rationale
> |
> |     If all the parts of the proposal are included, this addresses many
> |     concerns in a consistent and flexible fashion.  Among its chief
> |     virtues are the following three -- no, four -- no, five -- points:
> |
> |     1. It provides an extensible iterator interface.
> |
> |     2. It resolves the endless "i indexing sequence" debate.
> |
> |     3. It allows performance enhancements to dictionary iteration.
> |
> |     4. It allows one to provide an interface for just iteration
> |        without pretending to provide random access to elements.
> |
> |     5. It is backward-compatible with all existing user-defined
> |        classes and extension objects that emulate sequences and
> |        mappings, even mappings that only implement a subset of
> |        {__getitem__, keys, values, items}.
> 
> I can take out the Monty Python jokes if you want.  I can add more
> jokes if that will make you read it.  Just read it, i beg you.

Done. Didn't know it exists, though (why isn't the PEP#
in the subject line ?).

Even after reading it, I still don't get the idea behind adding
"Mapping Iterators" and "Sequence Iterators" when both of these
are only special implementations of the single "Iterator" 
interface.

Since the object can have multiple methods to construct
iterators, all you need is *one* iterator API. You don't
need a slot which returns an iterator object -- leave
that decision to the programmer, e.g. you can have:

for key in dict.xkeys():
for value in dict.xvalues():
for items in dict.xitems():
for entry in matrix.xrow(1):
for entry in matrix.xcolumn(2):
for entry in matrix.xdiag():
for i,element in sequence.xrange():

All of these method calls return special iterators for one
specific task and all of them provide a slot which is callable
without argument and yields the next element of the iteration.
Iteration is terminated by raising an IndexError just like
with __getitem__.

Since for-loops can check for the type slot, they can use an
optimized implementation which avoids the creation of
temporary integer objects and leave the state-keeping to the
iterator which can usually provide a C based storage for it with
much better performance.

Note that with this kind of interface, there is no need to
add "Mapping Iterators" or "Sequence Iterators" as special
cases, since these are easily implemented using the above
iterators.

> > e.g. how would you
> > specify to iterate backwards over a sequence using that notation
> > or diagonal for a matrix ?
> 
> No differently from what you are suggesting, at the surface:
> 
>     for item in sequence.backwards():
>     for item in matrix.diagonal():
> 
> The difference is that the thing on the right of 'in' is always
> considered a sequence-like object.  There is no ambiguity and
> no magic rule for deciding when it's a sequence and when it's
> an iterator.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/