Default arg for dict() (was Re: Sorting distionary by value)

John Machin sjmachin at lexicon.net
Fri Mar 29 15:37:41 EST 2002


"Chris Gonnerman" <chris.gonnerman at newcenturycomputers.net> wrote in message news:<mailman.1017410389.28258.python-list at python.org>...
> 
> Writing a class to simulate a dictionary with default value is simple,
> and left as an exercise for the reader :-)
> 

Unfortunately when building a dictionary of frequencies this approach
takes about 2.6x the clock() time of the best non-subclassing
approach, dropping down to 2.3x if you have a specialised subclass
with a hard-coded default value (see below).

Even more unfortunately the subclassing approach penalises *ALL*
__getitem__ accesses -- after building a dict of frequencies, you
don't need/want the default any longer when accessing the dict, but
with the subclassing approach, you're stuck with it unless you come up
with some pas-devant-les-enfants hack like replacing the __getitem__
method in mid-stream.

# best subclassing approach
class Defdict0(dict):
   def __getitem__(self, key):
      return self.setdefault(key, 0)
def build_freq_8(wlist):
   freq = Defdict0()
   for word in wlist:
      freq[word] = freq[word] + 1
   return freq

# best non-subclassing approach
def build_freq_5(wlist):
   freq = {}
   freq_has_key = freq.has_key
   for word in wlist:
      if freq_has_key(word):
         freq[word] = freq[word] + 1
      else:
         freq[word] = 1
   return freq

wlist is a list of strings matching [A-Za-z][A-Za-z]+ in the order
they were extracted from files.

Yes, the += caper is actually very slightly slower.
Yes, the has_key() approach actually beats the get() approach (1.05x)
and the try/except approach() (1.3x). Full details on request.
Roadtests done on an AMD Athlon processor running win32 Python 2.2.0
with -O. YMMV. Above relative times are based on a case with 1.8M
surname and given-name words (174K unique i.e. each word appears 10.4
times on average but the distribution is very skewed). Cases with a
much higher average frequency would probably tend to favour the
try/except approach.



More information about the Python-list mailing list