[Python-3000] Iterators for dict keys, values, and items == annoying :)
Adam DePrince
adam.deprince at gmail.com
Thu Mar 30 20:40:58 CEST 2006
On Thu, 2006-03-30 at 20:23 +1000, Nick Coghlan wrote:
> Robert Brewer wrote:
> > Nick Coghlan wrote:
> >> There are three big use cases:
> >>
> >> dict.keys
> >> dict.values
> >> dict.items
> >>
> >> Currently these all return lists, which may be expensive in
> >> terms of copying. They all have iter* variants which while
> >> memory efficient, are far less convenient to work with.
> >
> > I'm still wondering what "far less convenient" means. Is it simply the 4
> > extra key presses? I find the iter* variants to be a great solution.
>
> An iterator has some serious limitations as a view of a container:
>
> 1. Can only iterate once
Call the iter generator again to get a fresh iter.
> 2. Can't check number of items
len( dict )
> 3. Truth value testing doesn't work
What does true and false for an iter mean?
> 4. Containment tests don't work
Put the iter down and use the view.
'x' in dict
(1,'abc') in dict.items
'albatross' in dict.values
> 5. String representation is well-nigh useless
print list( mydata )
> 6. Value-based comparison doesn't work
You mean
d1 = {...
d2 = {...
d1.keys() == d2.keys()
Eww, you are depending on a lot on the internal operation of the dict.
It is possiable to generate different orders for the same key set (hash
tables start at 8 slots; any 5 items x & 0x7 == y & 0x7 will
collide ...
Watch:
>>> d = {}
>>> d[0] = 0
>>> d[8] = 8
>>> d.keys()
[0, 8]
>>> d = {}
>>> d[8] = 8
>>> d[0] = 0
>>> d.keys()
[8, 0]
>
> The source object supports all of these things, but the iterator doesn't. This
> is why iteritems and friends are poor substitutes for their list based
> equivalents in many situations.
I feel strongly about this. Sorry in advance about the rant ...
What you describe in this paragraph is a special case. A dict.key isn't
a list; while it was implemeneted as a list, and the list form is
currently understood, it isn't really a list because dict's have no
ordering. That we call it a list is a product of our own inertia
At risk of being sacrcastic the semantics (however not the performance)
of dict.keys is basically:
random.shuffle( list(dict.iter()) )
Basically, part of what I'm advocating for is the functions associated
with a datastore to do the minimum amount of work to support their
underlying semantics.
There seemed to be a concensus in the community on the size of the view
proposal, and I'm reimplementing the PEP to reflect that. But what I
can't resolve is the other anciliary issue: "To list or iter." I'm not
yet ready to resolve that issue. The views don't resolve it either, and
by their nature are biased towards the iter approach. They provide
__iter__ because its light weight to do, but there is no way a light
weight view can provide you with ordering information from an unordered
datastore. Now, as a means of resolving this conflict, I'm open to the
notion of a view implementing both __iter__ and an explicit .list method
to avoid any extra overhead in generating a list from an iter instead of
directly from the dict as we do now.
Personally, I think returning a list as a default is disgusting. Given
any two operations, one which is cheap and the other expensive, I feel
its the expensive one that your should have to work for.
We have two choices:
* Force the iter club to build that nasty list and iterate off of it.
There are times when you *can't* do this ... when your dict is big and
memory is limited. The integral( d_time, memory ) is a lot worse here,
and this will be a killer on memory constrained devices.
* Force the list club to get a nasty iter that they have to build a list
from. The cost: list( ... ) -- they have the same overall O(n) cost,
perhaps with a small constant overhead, they had before.
(Lots of pain upfront, or a little pain as you go, oh, I can see all
sorts of classical economic "utility of" arguments. The value of your
n-st byte of ram vs. your m-th.)
If given the requirement that the language implement only one, I feel
that it must be an iter. If the programmer feels that the iter is a bad
substitute for the list, then they can cast to a list.
If the programmer feels that the list is a bad substitute for the iter,
well, they could cast to a list when their machine is done paging or
their telephone reboots.
Insistence on a list represents a certain degree of hubris with respect
to the available resources. Some users, particularly the embedded
market, have significant space constrains.
Cheers - Adam
More information about the Python-3000
mailing list