[Python-3000] Iterators for dict keys, values, and items == annoying :)

Thu Mar 30 20:40:58 CEST 2006

On Thu, 2006-03-30 at 20:23 +1000, Nick Coghlan wrote:
> Robert Brewer wrote:
> > Nick Coghlan wrote:
> >> There are three big use cases:
> >>
> >>    dict.keys
> >>    dict.values
> >>    dict.items
> >>
> >> Currently these all return lists, which may be expensive in 
> >> terms of copying. They all have iter* variants which while
> >> memory efficient, are far less convenient to work with.
> > 
> > I'm still wondering what "far less convenient" means. Is it simply the 4
> > extra key presses? I find the iter* variants to be a great solution.
> 
> An iterator has some serious limitations as a view of a container:
> 
> 1. Can only iterate once

Call the iter generator again to get a fresh iter.  

> 2. Can't check number of items

len( dict ) 

> 3. Truth value testing doesn't work

What does true and false for an iter mean? 

> 4. Containment tests don't work

Put the iter down and use the view.  

'x' in dict
(1,'abc') in dict.items
'albatross' in dict.values

> 5. String representation is well-nigh useless

print list( mydata ) 

> 6. Value-based comparison doesn't work

You mean 

d1 = {...
d2 = {...

d1.keys() == d2.keys()

Eww, you are depending on a lot on the internal operation of the dict.
It is possiable to generate different orders for the same key set (hash
tables start at 8 slots; any 5 items x & 0x7 == y & 0x7 will
collide ... 

Watch: 

>>> d = {}
>>> d[0] = 0
>>> d[8] = 8
>>> d.keys()
[0, 8]
>>> d = {}
>>> d[8] = 8
>>> d[0] = 0
>>> d.keys()
[8, 0]

> 
> The source object supports all of these things, but the iterator doesn't. This 
> is why iteritems and friends are poor substitutes for their list based 
> equivalents in many situations.

I feel strongly about this.  Sorry in advance about the rant ... 

What you describe in this paragraph is a special case.  A dict.key isn't
a list; while it was implemeneted as a list, and the list form is
currently understood, it isn't really a list because dict's have no
ordering.  That we call it a list is a product of our own inertia

At risk of being sacrcastic the semantics (however not the performance)
of dict.keys is basically:

random.shuffle( list(dict.iter()) ) 

Basically, part of what I'm advocating for is the functions associated
with a datastore to do the minimum amount of work to support their
underlying semantics.  

There seemed to be a concensus in the community on the size of the view
proposal, and I'm reimplementing the PEP to reflect that.  But what I
can't resolve is the other anciliary issue: "To list or iter."  I'm not
yet ready to resolve that issue.  The views don't resolve it either, and
by their nature are biased towards the iter approach.  They provide
__iter__ because its light weight to do, but there is no way a light
weight view can provide you with ordering information from an unordered
datastore.  Now, as a means of resolving this conflict, I'm open to the
notion of a view implementing both __iter__ and an explicit .list method
to avoid any extra overhead in generating a list from an iter instead of
directly from the dict as we do now. 

Personally, I think returning a list as a default is disgusting.  Given
any two operations, one which is cheap and the other expensive, I feel
its the expensive one that your should have to work for. 

We have two choices:

* Force the iter club to build that nasty list and iterate off of it.
There are times when you *can't* do this ... when your dict is big and
memory is limited.  The integral( d_time, memory ) is a lot worse here,
and this will be a killer on memory constrained devices.

* Force the list club to get a nasty iter that they have to build a list
from.  The cost: list( ... )  -- they have the same overall O(n) cost,
perhaps with a small constant overhead, they had before.

(Lots of pain upfront, or a little pain as you go, oh, I can see all
sorts of classical economic "utility of" arguments.  The value of your
n-st byte of ram vs. your m-th.)

If given the requirement that the language implement only one, I feel
that it must be an iter.  If the programmer feels that the iter is a bad
substitute for the list, then they can cast to a list.  

If the programmer feels that the list is a bad substitute for the iter,
well, they could  cast to a list when their machine is done paging or
their telephone reboots.

Insistence on a list represents a certain degree of hubris with respect
to the available resources.  Some users, particularly the embedded
market, have significant space constrains.   

Cheers - Adam