a clean way to define dictionary

Thu Jun 19 14:00:09 EDT 2003

mis6 at pitt.edu (Michele Simionato) writes:

> Alexander Schmolck <a.schmolck at gmx.net> wrote in message news:<yfsof0v880w.fsf at black132.ex.ac.uk>...
> > Maybe I'm a bit blinkered, but right now I can't see how
> > 
> >     dict(foo=1, bar='sean')
> > 
> > is so much better/more convinient than
> > 
> >     {'foo':1, bar:'sean'}
> > 
> > that it justifies forcing people to learn a new redundant and less general
> > dictionary creation syntax that at least hinders customizing dictionary
> > instantiation like in
> > 
> >   counts = dict(default=0)
> >   for item in seq:
> >       counts[item] += 1
> > 
> > To get the first behavior you have to write a 1-line function, for the second
> > you have to subclass (and presumably incur a noticeable performance penalty).
> > Also the first precludes adding further constructor options, while the second
> > doesn't.
> > 
> > 
> > 'as
> 
> I do think you have a point here. Nevertheless, I wonder about the
> "noticeable performance penality" due to subclassing. I tried this
> dictionary:
> 
> class mydict(dict):
>     def __new__(cls,d,default=0): 
>         #absolutely trivial, doing nothing with the default argument
>         return super(mydict,cls).__new__(cls,d)
> 
> d=dict([(i,str(i)) for i in range(1000)])
> m=mydict([(i,str(i)) for i in range(1000)])
> 
> And I have measured the access times for 'd' and 'm' with timeit. There is no
> noticeable difference for getting a value (<1%); setting a value is slower
> by a 7-8%. Not a big deal. I get more or less the same numbers for bigger
> or smaller dictionaries. Nevertheless, I haven't tried many dictionaries,
> maybe there are cases where the performance is worse.

Well, your timings are not all that meaningful because your code does nothing
and you only instance creation and not item access (which obviously also needs
to be overridden with python code). Anywhere, here is a real DefaultDict class
and some ad hoc timings, which show *10* fold slowdown for creation and 4 fold
for item access (the copy.copy call seems harmless performance-wise).

class DefaultDict(dict):
    r"""Dictionary with a default value for unknown keys."""
    def __init__(self, default, noCopies=False):
        self.default = default
        if noCopies:
            self.noCopies = True
        else:
            self.noCopies = False
    def __getitem__(self, key):
        r"""If `self.noCopies` is `False` (default), a **copy** of
           `self.default` is returned by default.
        """              
        if key in self: return self.get(key)
        if self.noCopies: return self.setdefault(key, self.default)
        else:             return self.setdefault(key, copy.copy(self.default))

timings for python2.2

In [58]: timeCall(nTimes, 10000, dict)
Out[58]: 0.018522977828979492

In [59]: timeCall(nTimes, 10000, DefaultDict, 1)
Out[59]: 0.11231005191802979

In [47]: timeCall(nTimes, 10000, {'foo':0}.__getitem__, 'foo')
Out[47]: 0.012811064720153809

In [48]: timeCall(nTimes, 10000, DefaultDict(1).__getitem__, 'foo')
Out[48]: 0.045480012893676758

In [65]: timeCall(nTimes, 10000, DefaultDict(1,0).__getitem__, 'foo')
Out[65]: 0.04657900333404541

> 
> If somebody wants to experiment and report here few numbers, it would be
> interesting.
> 
>                                                    Michele

What are your results for instance creation with this class? Maybe 2.3 is
noticeably faster?

'as