Alternative iterator syntax

Wed Feb 21 18:21:10 EST 2001

Wed, 21 Feb 2001 21:51:15 GMT, Ben Wolfson <wolfson at midway.uchicago.edu> pisze:

> One could also specify that, when appropriate, an iterator calls
> __reset__ before raising an IndexError.

It doesn't work. I can have several iterators walking over a single
sequence at once.

Obtaining a stateful iterator is a stateful operation. You can't make
the iterator once and keep it with the sequence, but you must produce
an iterator each time an iteration is requested - unless its usage
is functional, i.e. for the next element you don't call a method
with side effects keeping the same iterator handle, but rebind the
iterator handle after each iteration.

If such operation is spelled as an attribute access, it must still be a
function call internally. Disadvantages are twofold: The implementation
must use __getattr__ to let attribute access have function call
semantics. And it's confusing to programmers, which expect that such
syntax accesses the same object each time (at least logically the same)
- but they are stateful and have different identities.

<ftp://ftp.netcom.com/pub/hb/hbaker/Iterator.html>

For comparison, in the tradition of functional languages linked lists
are very common. They are their own iterators, i.e. it is not needed
to have a separate iterator concept.

Elements are not accessed by indices (unless someone really wants it),
because such element access is O(n). Instead recursive functions are
written to traverse lists. There are functions which wrap common
patterns of such iteration, e.g. map, filter, reduce.

Iteration relies on the fact that obtaining a tail of a list (the
same list but without its first element) is O(1). It is safe to not
perform any copying, because lists are generally immutable. In lazy
languages lists are of course lazy. Functional programming is really
about immutability; functions are only the second important thing.

I would strongly advise to not invent a complicated stateful iterator
framework, with all C++ features like stepping backwards, stepping
the specified number of elements, computing a distance between two
iterators over the same sequence, sorting the part of a sequence
between two iterators, and all weird rules about which iterators are
invalidated when you modify the sequence in interesting ways.

Instead, if you want to have lazy reading of files expressed as an
iteration over the sequence of lines, lazy iteration over dictionary
keys without building their physical list, etc. - either design a
simple iterator protocol to keep close to the current Python, or build
a rich framework for lazy lists. In lazy languages all data structures
are lazy, but for Python it should be enough to have lazy lists.

Please don't stress mutability of such sequences. You should not
mutate a sequence you are iterating over.

In any case, the thing a 'for' loop is looping over should be
kind-of-sequence. Not iterator (a new concept for Python) nor a pair
of them (ugh!). Various things can be expressed as kind-of-sequences,
e.g. reversing adaptor which does not make a physical copy, or a
subsequence, or things discussed before: range(), readlines(), keys().

By kind-of-sequence I mean that it's not sufficient to use current
sequence interface. It does not allow to distinguish between sequential
access and random index access. It can be extended to primarily have
sequential access in mind.

-- 
 __("<  Marcin Kowalczyk * qrczak at knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK