[Python-3000] Iterators for dict keys, values, and items == annoying :)

Fri Mar 24 02:00:17 CET 2006

Guido van Rossum wrote:
>>In SQLObject it came about due to a desire to lazily load objects out of
>>a query.  The lazy behavior had other problems (mostly introducing
>>concurrency where you wouldn't expect).  In addition, the query is only
>>run when you start iterating.  I'm not sure if that is good or bad
>>design -- that queries are iterable doesn't seem that bad, except that
>>the query is only invoked with iter() and that doesn't give very good
>>access to the actual executed-query object; it's all too implicit.
> 
> 
> I'm becoming more and more doubtful about the design of SQLobject;
> perhaps it's just not a good example since the issues seem to be
> caused by its specific design more than by the language features it's
> using.

I'm just outlining the specific problems I found looking back on the 
design there, where I tried some of these techniques, with different 
levels of success or frustration.  I haven't argued that those decisions 
were all good decisions.

>>I don't know if the same issues exist for .items/.keys; I guess it would
>>only be an issue if you passed one of iterators to some routine that
>>didn't have access to the original dict.
> 
> 
> But again that's an API design issue -- if the routine needed to know
> ahead of time whether the underlying collection was empty it should be
> given access to the collection. OTOH if you have an API that knows it
> can be given *any* iterator, then the "empty" flag pattern that I
> mentioned earlier is the only reliable way to differentiate between an
> empty and a non-empty containier. (Note that I refuse to say "empty
> iterator"!)

Empty iterator or iterator that produced no items -- from the outside 
it's the same use case.

Iterators look a lot like containers.  Often I only use a list by 
iterating over it; if that's all I do then I can't the difference.  At 
that point it is ambiguous.  I'm not even sure if a "sequence" means a 
list-like object or an iterable.  That's ambiguous too.  So I'm only 
pointing out an existing ambiguity, and a place where that ambiguity 
causes problems.

Right now this is how I would iterate over a container, special-casing 
an empty container:

   if container:
       for item in container: ...
   else:
       ...

In this case I am testing if the container is empty, and this generally 
works.  Then an iterator is introduced, and my code breaks.  So, I have 
to choose -- do I convert the iterator to a container with list() (and 
maybe needlessly copying a container), or do I switch to only using the 
iteratable aspect of the container, like:

   empty = True
   for item in container:
       empty = False
       ...
   if empty:
       ...

If using the iterable interface in this case felt as natural as using 
the container interface, then I'd probably have used the iterable form 
from the beginning and I wouldn't have a problem.   But it doesn't feel 
as natural, so I don't.

I can't say *everyone* makes the same choice as me, so I am using the 
first person in this argument.  But I think most people do the same as I 
do, and so because the language does not make the iterable form very 
pretty it causes people to use the container interface (i.e., 
__nonzero__) even though they don't really need to.

>>The identical problem does exist for all generators.  Using ad hoc flags
>>in for loops isn't a great solution.  It's all somewhat similar to the
>>repr() problem as well.
> 
> 
> Not all generators. A fair number of generators are methods on
> collections that implement various iterators.
> 
> OTOH generators are one of the reasons that the iterator protocol is
> as restricted as it is.

I'm not arguing for adding __nonzero__ to iterators, only for addressing 
this use case where currently I make use of __nonzero__.  Or, 
alternately, having whatever d.keys() returns implement __nonzero__, or 
otherwise be an iterable and not an iterator.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org