[Tutor] List comprehension for dicts?
Steven D'Aprano
steve at pearwood.info
Thu Aug 19 19:20:12 CEST 2010
On Fri, 20 Aug 2010 01:40:54 am Wayne Werner wrote:
> > age_dict = dict([(key.upper(), value) for key,value in
> > age_dict.items()])
>
> This is a bad place to use a list comprehension. This will create a
> list of values first and then create a dict from that list, so now
> you have a list floating around that you didn't need.
How do you know that dict() doesn't need it the entire list at once?
Unless you're an expert on the implementation, for all you know it
looks something like this:
class dict:
def __new__(cls, arg):
...
try:
n = len(arg)
except AttributeError:
# probably a generator expression
arg = list(arg)
n = len(arg)
allocate_memory_for_items(n)
...
(only written in C). I'm not saying it does, or that it doesn't, but
you're assuming a pattern of behaviour which might not be the case.
Here's a similarly common idiom: which of these is faster?
' '.join(gen_expr)
' '.join(list_comp)
[steve at sylar ~]$ python -m timeit "' '.join(str(n) for n in
xrange(300000))"
10 loops, best of 3: 437 msec per loop
[steve at sylar ~]$ python -m timeit "' '.join([str(n) for n in
xrange(300000)])"
10 loops, best of 3: 401 msec per loop
The list comprehension is consistently faster, because join() works more
efficiently if it knows how many items it needs to pre-allocate memory
for.
> Generator expressions, OTOH, generate the values on the fly, only as
> they're needed, so there's no extra list left over once the dict is
> created.
And sometimes that's a win, and sometimes it's not.
Generator expressions are more computationally expensive than lists --
they're functions which remember their internal state so you can pause
them and restart them at will. That doesn't happen for free -- it takes
memory and time. The reason Python has generator expressions is that
for large amounts of data, the saving you have by not needing to
produce the entire list all at once more than makes up for the extra
cost, but for small amounts of data, that's not always the case:
[steve at sylar ~]$ python -m timeit "dict((k,k+1) for k in xrange(2))"
100000 loops, best of 3: 5.89 usec per loop
[steve at sylar ~]$ python -m timeit "dict([(k,k+1) for k in xrange(2)])"
100000 loops, best of 3: 4.78 usec per loop
Here, using a generator expression is a pessimation, not an
optimization.
> Sure it will eventually be garbage collected, but "waste not, want
> not", as my grandmother used to say.
Does your grandmother have a box labelled "Pieces of string, too short
to use" as well?
--
Steven D'Aprano
More information about the Tutor
mailing list