Default arg for dict() (was Re: Sorting distionary by value)
John Machin
sjmachin at lexicon.net
Fri Mar 29 15:37:41 EST 2002
"Chris Gonnerman" <chris.gonnerman at newcenturycomputers.net> wrote in message news:<mailman.1017410389.28258.python-list at python.org>...
>
> Writing a class to simulate a dictionary with default value is simple,
> and left as an exercise for the reader :-)
>
Unfortunately when building a dictionary of frequencies this approach
takes about 2.6x the clock() time of the best non-subclassing
approach, dropping down to 2.3x if you have a specialised subclass
with a hard-coded default value (see below).
Even more unfortunately the subclassing approach penalises *ALL*
__getitem__ accesses -- after building a dict of frequencies, you
don't need/want the default any longer when accessing the dict, but
with the subclassing approach, you're stuck with it unless you come up
with some pas-devant-les-enfants hack like replacing the __getitem__
method in mid-stream.
# best subclassing approach
class Defdict0(dict):
def __getitem__(self, key):
return self.setdefault(key, 0)
def build_freq_8(wlist):
freq = Defdict0()
for word in wlist:
freq[word] = freq[word] + 1
return freq
# best non-subclassing approach
def build_freq_5(wlist):
freq = {}
freq_has_key = freq.has_key
for word in wlist:
if freq_has_key(word):
freq[word] = freq[word] + 1
else:
freq[word] = 1
return freq
wlist is a list of strings matching [A-Za-z][A-Za-z]+ in the order
they were extracted from files.
Yes, the += caper is actually very slightly slower.
Yes, the has_key() approach actually beats the get() approach (1.05x)
and the try/except approach() (1.3x). Full details on request.
Roadtests done on an AMD Athlon processor running win32 Python 2.2.0
with -O. YMMV. Above relative times are based on a case with 1.8M
surname and given-name words (174K unique i.e. each word appears 10.4
times on average but the distribution is very skewed). Cases with a
much higher average frequency would probably tend to favour the
try/except approach.
More information about the Python-list
mailing list