[Baypiggies] Notes on defaultdict

Rami Chowdhury rami.chowdhury at merton.oxon.org
Fri Aug 27 19:54:59 CEST 2010


Hey Glen,

Great notes -- thanks for posting them, I'm glad to see them as I
wasn't able to be at the meeting.

Did you mention the __missing__ magic method, in Python 2.5 and up,
which can be used for defaultdict-like (and other fun) behavior?

Cheers,
Rami

On Fri, Aug 27, 2010 at 12:16, Glen Jarvis <glen at glenjarvis.com> wrote:
> Here are notes with code that I used in my Newbie Nugget presentation today.
> This will give you time to review the code line by line (good suggestion
> Wesley) and get a feel for what is happening.
>
> As a reminder:
> The input is:
> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
> And, we want an output dictionary so that something like this is reported:
> output['red'] = [1]
> output['blue'] = [2, 4]
> output['yellow'] = [1, 3]
>
> There are lots of ways to do this, and I invite other suggestions/input.
> If we were to start with these few lines, we'll see some limited success:
> Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
> [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>>
>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red',
>>>> 1)]
>>>>
>>>> output = {}
>>>>
>>>> output['yellow'] = [1]
>>>> output['yellow'].append(3)
>>>>
>>>> print output
> {'yellow': [1, 3]}
> But, how do we sort this out so that we always initialize the value to be a
> list value If we forget, we'll see something like this
> [continuing from above]
>>>> print output
> {'yellow': [1, 3]}
>>>> output['blue'].append(2)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> KeyError: 'blue'
> We can't append something to the stuff that is associated with the 'blue'
> key, because nothing exists yet for the blue key. We need to initialize it
> before we can append to it.
> So, what we really want is to have some type of default behavior given to
> us. For example,
> Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
> [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>>
>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red',
>>>> 1)]
>>>>
>>>> output = {}
>>>>
>>>> for color, value in input:
> ...     if output.has_key(color):
> ...         output[color].append(value)
> ...     else:
> ...         output[color] = [value]
> ...
>>>> print output
> {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}
>
> One could argue that this is just fine -- this is very clear, and it works.
> And, frankly, there's merit to that argument.
> However, if we did this many times with many different types of
> dictionaries, and we couldn't really make a generic routine that we could
> use to handle this, we'll have this same pattern over and over:
> if something.has_key(…):
>     something[…] = <normal behavior>
> else:
>     something[…] = <some initializing behavior>
>
> What if there was another design pattern to do this? What if you could
> somehow set a 'default' behavior so that, if the key didn't exist, you
> automatically got, in our example, a list so you could append to it. That
> is, we'd have code that looked like this:
>     output[color].append(value)
> This does fit our brain, because we really just want to append values as
> we're going and not have to think of the special case of (oh, if the key
> exists do this; or if it doesn't, do that; etc.)
> The collections.defaultdict object does the above. However, before we talk
> about it -- there will be some in the audience that will say, "Wait a
> minute…  This is what setdefault does." And, they'd be right.
> So, let's take a break from this newbie nugget to give you another newbie
> nugget -- the setdefult (unless you already know about it).
> The set default method can be used on a dictionary to set the default for a
> key. And, it can be done multiple times without causing a problem. So, for
> my output dictionary, if I set the default to a list (for the key yellow,
> for example), then whenever I access the output dictionary through the
> yellow key, I get a dictionary automatically. And, thus, we can tack an
> 'append' on to that list that's given to us by default. Here's an example:
> Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
> [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>>
>>>>
>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red',
>>>> 1)]>>>
>>>> output = {}
>>>>
>>>> for color, value in input:
> ...     output.setdefault(color, []).append(value)
> ...
>>>> print output
> {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}
> For each key in the dictionary, the default value is a list and the value is
> appended to it. Although, it does seem we're having to set the default for
> every single key -- not for any key in the dictionary.
> But, that's a good transition to the collections.defaultdict because that's
> exactly what it does.
>>>> from collections import defaultdict
>>>>
>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red',
>>>> 1)]
>>>>
>>>> output = defaultdict(list)
>>>>
>>>> for color, value in input:
> ...     output[color].append(value)
> ...
>>>> print output
> defaultdict(<type 'list'>, {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]})
> Don't be scared about this extra output -- just notice that the colors have
> the right lists. If you wanted to print these out by keys, you'll still get
> the results that you want:
>>>> print output['blue']
> [2, 4]
> What a default dictionary type is, it's simply a subclassed version of the
> dictionary that does a few special things.
> You give it a basic 'factory function' -- in our case a list -- and that's
> what's initialized for a key the first time it is accessed. We don't have to
> worry if the key was previously initialized, etc..
> Some will argue that this is less explicit -- and I would probably have been
> one of them when I first started preparing this newbie nugget. But, after
> doing over this a few times to explain it, I've finally started to come on
> board to the idea. We're explicitly stating the default type when we
> initialize the defaultdict variable.
> I especially like it that the default dict gives a default type for any key
> -- not for a specific key as setdefault seems to do (correct me if I
> misspeak here).
> But, more than anything, it's *much faster* than doing it the other ways..
> But, I want you to prove that to yourself and to me… I have created a 1 GB
> file with colors and numbers. I'll give you this file so that you can read
> it in and create the dictionary of your choice, with the method of your
> choice.
> You can time how long it takes for dictionary creation, with the list
> appending, and everything.
> Remember that your dictionary will be in memory and will be about a gig of
> ram also. If your machine doesn't have those resources, read only half the
> file. In fact, it would be interesting to do this for a random number of
> records in the file, multiple times, to get a feel for how this grows for
> each data type.
> Come on, it'll be good practice if you've never done this before… Lets
> compare notes when you're finished.
> --
> Whatever you can do or imagine, begin it;
> boldness has beauty, magic, and power in it.
>
> -- Goethe
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>



-- 
Rami Chowdhury
"Never assume malice when stupidity will suffice." -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)


More information about the Baypiggies mailing list