[Python-3000] Iterators for dict keys, values, and items == annoying :)

Guido van Rossum guido at python.org
Thu Mar 23 21:42:25 CET 2006


On 3/23/06, Ian Bicking <ianb at colorstudy.com> wrote:
> Jim Fulton wrote:
> > Looking over some of the messages in the archives, I saw a reference
> > to making dict keys, items, and values methods return iterators.  I've heard
> > Guido mention this in the past.

It's been one of the first things I've always wanted in Python 3000,
ever since we added iterators in 2.2.

I've read and re-read Jim's message, and I'm not sure I understand it.
It seems he's working in an interactive session but I'm not sure I
understand the problem he has with adding list() around an expression
(unless he hasn't got readline, in which case he's got worse
problems). IMO the feature he asks for getting a list back already
exists is adding list() around the expression. (I actually suspect
that in many cases set() would be a more useful choice.)

> I saw this too in the archives, and thought shit, that's going to mess
> up a lot of my code.  I would assume (though it's a separate point of
> discussion) that Python 3k should still try hard to keep backward
> compatibility.  Backward compatibility isn't a requirement, but it's
> still clearly a feature.

You seem to be misunderstanding what Python 3000 is. The whole point
of Python 3000 is to *not* be bound by backwards compatibility
constraints, but instead make the best decisions possible (without
making it a different language).

> For an instance of code that would be broken:
>
>    for key in d.keys():
>        if something(key):
>            del d[key]
>
> If I didn't want a list, I probably would have iterated over d, wouldn't
> I?

Depends on whether you wrote that code before or after Python 2.2. In
2.1 and before, you *couldn't* iterate over d, so you were forced to
use d.keys() whether you wanted it or not.

> Items is a little fuzzier, but I do a lot of:
>
>    items = d.items()
>    items.sort()
>
> Not as big an issue, because these days I can already do
> sorted(d.items()) for the same effect.  Still, the change doesn't seem
> that interesting or useful to me, in comparison to the effect it will
> have on so much code.

It's interesting to me because there's a bunch of APIs that currently
have two versions: one to get a list and one to get an iterator. It
would be cleaner if only the iterator version existed, and the way to
get a list was to put an explicit list() around it. Building the list
is expensive, and often not needed (a lot of algorithms don't mutate
the dict).

> One idea I had after reading a post of Brett's was a dual-use attribute;
> if you do d.keys you get an iterable (not an iterator, of course), and
> if you call that iterable you get a list.  This is backward compatible,
> arguably prettier anyway to make it a property (since there's no side
> effects and getting an iterable isn't expensive, the method call seems
> somewhat superfluous).

You gotta be kidding about calling something pretty which allows a
common mistake (leaving out the () brackets) turn into such a subtle
bug (not making a copy).

> One can argue that this adds redundancy.
>
> But anyway, a conceptual argument against .items() returning an
> iterator: .items() reads as a request for a concrete object to me.  That
> is, it doesn't read as "give me a promise that later you can give me the
> items from this object", it reads as "give me the items, right now and
> right here".  If it was a set-like object instead of a list, that'd be
> fine (maybe better -- avoid arbitrary ordering entirely).  But that's a
> separate conversation.

Actually that's a very interesting conversation. Last year I wrote a
large body of Java code that used the Java collections package a lot,
and I ended up liking some of its choices. Its maps have methods to
return keys, values and items, but these return neither new lists nor
iterators; they return "views" which obey set (or multiset, in the
case of items) semantics. The effect of mutating one or the other is
carefully defined to allow an efficient implementation and foolproof
use (e.g. you can delete an item from the keys set and it will remove
the corresponding item from the underlying mapl but you can't insert
an item into the keys set because there's no value to map it to in the
map. The views can then be iterated over as many times as you want to.

I'd like to explore this as an alternative to making keys() etc.
return iterators. It also might make keys() etc. more similar to the
new range(), which will behave like the current xrange(): it doesn't
return an iterator, but an "iterator well". That's the behavior
range() would have had in the first place if I had thought of
iterators earlier.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list