<div>Here are notes with code that I used in my Newbie Nugget presentation today. This will give you time to review the code line by line (good suggestion Wesley) and get a feel for what is happening.</div><div><br></div><div>
<br></div><div>As a reminder:</div><div><br></div><div>The input is:</div><div>input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]</div><div><br></div><div>
And, we want an output dictionary so that something like this is reported:</div><div><br></div><div>output['red'] = [1]</div><div>output['blue'] = [2, 4]</div><div>output['yellow'] = [1, 3]</div><div>
<br></div><div><br></div><div>There are lots of ways to do this, and I invite other suggestions/input.</div><div><br></div><div>If we were to start with these few lines, we'll see some limited success:</div><div><br></div>
<div>Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) </div><div>[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin</div><div>Type "help", "copyright", "credits" or "license" for more information.</div>
<div>>>> </div><div>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]</div><div>>>> </div><div>>>> output = {}</div>
<div>>>> </div><div>>>> output['yellow'] = [1]</div><div>>>> output['yellow'].append(3)</div><div>>>> </div><div>>>> print output</div><div>{'yellow': [1, 3]}</div>
<div><br></div><div>But, how do we sort this out so that we always initialize the value to be a list value If we forget, we'll see something like this </div><div><br></div><div>[continuing from above]</div><div>>>> print output</div>
<div>{'yellow': [1, 3]}</div><div>>>> output['blue'].append(2)</div><div>Traceback (most recent call last):</div><div> File "<stdin>", line 1, in <module></div><div>KeyError: 'blue'</div>
<div><br></div><div>We can't append something to the stuff that is associated with the 'blue' key, because nothing exists yet for the blue key. We need to initialize it before we can append to it.</div><div><br>
</div><div>So, what we really want is to have some type of default behavior given to us. For example, </div><div><br></div><div>Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) </div><div>[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin</div>
<div>Type "help", "copyright", "credits" or "license" for more information.</div><div>>>> </div><div>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]</div>
<div>>>> </div><div>>>> output = {}</div><div>>>> </div><div>>>> for color, value in input:</div><div>... if output.has_key(color):</div><div>... output[color].append(value)</div>
<div>... else:</div><div>... output[color] = [value]</div><div>... </div><div>>>> print output</div><div>{'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}</div><div><br></div><div>
<br></div><div>One could argue that this is just fine -- this is very clear, and it works. And, frankly, there's merit to that argument.</div><div><br></div><div>However, if we did this many times with many different types of dictionaries, and we couldn't really make a generic routine that we could use to handle this, we'll have this same pattern over and over:</div>
<div><br></div><div>if something.has_key(…):</div><div> something[…] = <normal behavior></div><div>else:</div><div> something[…] = <some initializing behavior></div><div><br></div><div><br></div><div>What if there was another design pattern to do this? What if you could somehow set a 'default' behavior so that, if the key didn't exist, you automatically got, in our example, a list so you could append to it. That is, we'd have code that looked like this:</div>
<div><br></div><div> output[color].append(value)</div><div><br></div><div>This does fit our brain, because we really just want to append values as we're going and not have to think of the special case of (oh, if the key exists do this; or if it doesn't, do that; etc.)</div>
<div><br></div><div>The collections.defaultdict object does the above. However, before we talk about it -- there will be some in the audience that will say, "Wait a minute… This is what setdefault does." And, they'd be right.</div>
<div><br></div><div>So, let's take a break from this newbie nugget to give you another newbie nugget -- the setdefult (unless you already know about it).</div><div><br></div><div>The set default method can be used on a dictionary to set the default for a key. And, it can be done multiple times without causing a problem. So, for my output dictionary, if I set the default to a list (for the key yellow, for example), then whenever I access the output dictionary through the yellow key, I get a dictionary automatically. And, thus, we can tack an 'append' on to that list that's given to us by default. Here's an example:</div>
<div><br></div><div>Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) </div><div>[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin</div><div>Type "help", "copyright", "credits" or "license" for more information.</div>
<div>>>> </div><div>>>> </div><div>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]>>> </div><div>>>> output = {}</div>
<div>>>> </div><div>>>> for color, value in input:</div><div>... output.setdefault(color, []).append(value)</div><div>... </div><div>>>> print output</div><div>{'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}</div>
<div><br></div><div>For each key in the dictionary, the default value is a list and the value is appended to it. Although, it does seem we're having to set the default for every single key -- not for any key in the dictionary.</div>
<div><br></div><div>But, that's a good transition to the collections.defaultdict because that's exactly what it does.</div><div><br></div><div>>>> from collections import defaultdict</div><div>>>> </div>
<div>>>> input = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]</div><div>>>> </div><div>>>> output = defaultdict(list)</div><div>>>> </div>
<div>>>> for color, value in input:</div><div>... output[color].append(value)</div><div>... </div><div>>>> print output</div><div>defaultdict(<type 'list'>, {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]})</div>
<div><br></div><div>Don't be scared about this extra output -- just notice that the colors have the right lists. If you wanted to print these out by keys, you'll still get the results that you want:</div><div><br>
</div><div>>>> print output['blue']</div><div>[2, 4]</div><div><br></div><div>What a default dictionary type is, it's simply a subclassed version of the dictionary that does a few special things.</div>
<div><br></div><div>You give it a basic 'factory function' -- in our case a list -- and that's what's initialized for a key the first time it is accessed. We don't have to worry if the key was previously initialized, etc..</div>
<div><br></div><div>Some will argue that this is less explicit -- and I would probably have been one of them when I first started preparing this newbie nugget. But, after doing over this a few times to explain it, I've finally started to come on board to the idea. We're explicitly stating the default type when we initialize the defaultdict variable.</div>
<div><br></div><div>I especially like it that the default dict gives a default type for any key -- not for a specific key as setdefault seems to do (correct me if I misspeak here).</div><div><br></div><div>But, more than anything, it's *much faster* than doing it the other ways.. But, I want you to prove that to yourself and to me… I have created a 1 GB file with colors and numbers. I'll give you this file so that you can read it in and create the dictionary of your choice, with the method of your choice.</div>
<div><br></div><div>You can time how long it takes for dictionary creation, with the list appending, and everything.</div><div><br></div><div>Remember that your dictionary will be in memory and will be about a gig of ram also. If your machine doesn't have those resources, read only half the file. In fact, it would be interesting to do this for a random number of records in the file, multiple times, to get a feel for how this grows for each data type.</div>
<div><br></div><div>Come on, it'll be good practice if you've never done this before… Lets compare notes when you're finished.</div><br>-- <br>Whatever you can do or imagine, begin it;<br>boldness has beauty, magic, and power in it.<br>
<br>-- Goethe <br>