[Python-3000] Iterators for dict keys, values, and items == annoying :)

Fri Mar 24 01:36:41 CET 2006

On 3/23/06, Ian Bicking <ianb at colorstudy.com> wrote:
> Guido van Rossum wrote:
> > On 3/23/06, Ian Bicking <ianb at colorstudy.com> wrote:
> > [Guido]
> >>>Testing whether an iterator is empty or not is an oxymoron; the only
> >>>legit way is to call next() and see whether it raises StopIteration.
> >>>This is the fundamental confusion I am talking about. It is NOT
> >>>"natural enough". It reveals a fundamental misunderstanding of the
> >>>design of the iterator protocol.
> >>
> >>I'm talking about a use case, not the protocol.  Where iterators are
> >>used, it is very common that you also want to distinguish between zero
> >>and some items.
> >
> > Really? Methinks you are thinking of a fairly specific context -- when
> > presenting database query results to a user. The problem IMO lies in
> > SQLObject (which I admit I've never used) or perhaps in SQL itself, or
> > the specific underlying DB. In most other situations, you have an
> > honest-to-god container (e.g. a dict) which you can test for emptiness
> > before even asking for an iterator over its items. When all you have
> > is a query represented as an iterator this doesn't fly. That's why
> > some DB API implementations return the number of results as the
> > non-standard return value of the query API (at least that's what I
> > recall -- it's been a while since I used the DB API).
>
> In SQLObject it came about due to a desire to lazily load objects out of
> a query.  The lazy behavior had other problems (mostly introducing
> concurrency where you wouldn't expect).  In addition, the query is only
> run when you start iterating.  I'm not sure if that is good or bad
> design -- that queries are iterable doesn't seem that bad, except that
> the query is only invoked with iter() and that doesn't give very good
> access to the actual executed-query object; it's all too implicit.

I'm becoming more and more doubtful about the design of SQLobject;
perhaps it's just not a good example since the issues seem to be
caused by its specific design more than by the language features it's
using.

> I don't know if the same issues exist for .items/.keys; I guess it would
> only be an issue if you passed one of iterators to some routine that
> didn't have access to the original dict.

But again that's an API design issue -- if the routine needed to know
ahead of time whether the underlying collection was empty it should be
given access to the collection. OTOH if you have an API that knows it
can be given *any* iterator, then the "empty" flag pattern that I
mentioned earlier is the only reliable way to differentiate between an
empty and a non-empty containier. (Note that I refuse to say "empty
iterator"!)

> The identical problem does exist for all generators.  Using ad hoc flags
> in for loops isn't a great solution.  It's all somewhat similar to the
> repr() problem as well.

Not all generators. A fair number of generators are methods on
collections that implement various iterators.

OTOH generators are one of the reasons that the iterator protocol is
as restricted as it is.

> Coming back around to the idea of implementing __getitem__ and such, I
> suppose a list-like iterator wrapper could be useful.  That would
> consume and retain the results of the iterator lazily to satisfy the
> things done to the object.

I'm not sure that's all that useful. It reminds me of early
pseudo-iterators that were implemented as lazy lists using
__getitem__; these were eventually replaced by true iterators.

The two extremes of the spectrum are already taken care of: use
list(it) if you need truly random access; or iterate over the iterator
exactly once if you can handle sequential access (like reading a
file).

> That would be kind of interesting; I
> implemented several such methods on the select result object in
> SQLObject for that purpose, and that aspect actually works pretty well.
>   There's some predictability problems, though.  bool(obj) would only
> have to consume one item, but len(obj) would consume the entire thing,
> and usually len() is a pretty innocuous function to use.

Which is why I think it's a bad idea to go down this lane.

> If this was done, it would be nice if an iterator could give hints, like
> a faster implementation of __len__ than the fallback behavior that only
> can use .next().

That's what __len__ on iterators was intended for in 2.4. In 2.5 it
will be reincarnated as __sizehint__ (I believe that's the name we
settled on).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)