Why is dictionary.keys() a list and not a set?

bonono at gmail.com bonono at gmail.com
Thu Nov 24 12:02:39 CET 2005


Fredrik Lundh wrote:
> bonono at gmail.com wrote:
>
> > > creates just over one million objects.  In your "equivalent" example,
> > > you're calling d.items() twice to produce two million objects, none
> > > of which you really care about.
> >
> > This is what I get from the doc :
> > a.items()  a copy of a's list of (key, value) pairs  (3)
> > a.keys() a copy of a's list of keys (3)
> > a.values() a copy of a's list of values
> >
> > I can't derive what you mean by "two list objects" vs "million
> > objects".
>
> The "copy of a's list of" is a list object.
>
> The "pairs" are tuples, which are objects too.  For a dictionary with
> one million items, the "items" method has to create one million "pair"
> tuples before it can return them to you...  (the difference between
> "items" and "iteritems" is that the latter doesn't create all of them
> up front; it still has to create them, though).
>
> In Python, everything that you can put in a variable or otherwise save
> for a later time is an object.  All objects must be created.  Creating a
> new object is often very cheap, but the cost is never zero.

I have redo the timeit test :

D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"zip(d.k
eys(),d.values())"
10 loops, best of 3: 158 msec per loop

D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"d.items
()"
10 loops, best of 3: 129 msec per loop

D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"d.keys(
)"
10 loops, best of 3: 33.2 msec per loop

D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"d.value
s()"
100 loops, best of 3: 19.6 msec per loop

These results make more sense. However, I am still puzzled :

1. why would d.keys()/d.values() only return one list element ? Why
isn't it a list of 1M element of either the keys or values but items()
is ?

2. Back to the original question of the guranteed sequence of
keys/values

If I don't need them in sync, the gurantee is a moot point and in that
case, I would use keys/values when appropriate(or the iter version).

If I need them in sync(meaning I would like to zip() them later), the
zip() time is longer than just taking items() and I don't gain any
performance advantage.

So unless I use keys()/values() only as k[i]/v[i], that is keeping my
own index, I would see get the speed(and probably memory) advantage.
This however seems to be more difficult to maintain than just use the
item tuples.

And would it be the best to just use iteritems() in 90% of the case
unless of course I need to update the dict or that I need a snapshot ?
As iteritems() don't create the list and only retrieve them when
needed. But then, there may be performance penalty comparing with
create the list upfront if I need them all. And of course, I cannot
[:].




More information about the Python-list mailing list