[Tutor] request for comments regarding my function

Mon Oct 26 03:56:52 CET 2009

Thank you very much! You certainly gave me plenty to think and read
about. I had not known about functools or collections.

On Sun, Oct 25, 2009 at 7:09 AM, Rich Lovely <roadierich at googlemail.com> wrote:
> 2009/10/25 Isaac <hyperneato at gmail.com>:
>> Hello all,
>> I wrote a function to organize a list of sorted objects ( django blog
>> entries ); each object has a Python datetime attribute ( called
>> pub_date ). I am posting a message to the tutor mailing list to get
>> feedback regarding better ways to accomplish the same task. I have a
>> feeling I could use recursion here, instead of looping through the
>> entire _query list 3 times. Here is my code:
>>
>> def create_archive_list(_query):
>>    """
>>    given a sorted queryset, return a list in the format:
>>
>>    archive_list = [{'2009': [{'December': ['entry1', 'entry2']},
>>                              {'January': ['entry3', 'entry4']}
>>                              ]
>>                     },
>>
>>                    {'2008': [{'January': ['entry5', 'entry6']},
>>                              ]
>>                     }
>>                    ]
>>    """
>>
>>    archive_list = []
>>    tmp_year_list = []
>>
>>    # make a list of Python dictionaries; the keys are the year, as in
>> '2009', and the value
>>    # is an empty list.
>>    for item in _query:
>>        if item.pub_date.year not in tmp_year_list:
>>            tmp_year_list.append(item.pub_date.year)
>>            tmp_dict = {}
>>            tmp_dict[item.pub_date.year] = []
>>            archive_list.append(tmp_dict)
>>        else:
>>            pass
>>
>>    # for every list in the archive_list dictionaries, append a
>> dictionary with the
>>    # blog entry's month name as a key, and the value being an empty list
>>    for entry in _query:
>>        for item in archive_list:
>>            _year = entry.pub_date.year
>>            if _year in item:
>>                _tmp_month = entry.pub_date.strftime("%B")
>>                # make a dictionary with the month name as a key and
>> an empty list as a value
>>                tmp_dict = {}
>>                tmp_dict[_tmp_month] = []
>>
>>                if tmp_dict not in item[_year]:
>>                    item[_year].append(tmp_dict)
>>
>>    # append the blog entry object to the list if the pub_date month
>> and year match
>>    # the dictionary keys in the archive_list dictionary/list tree.
>>    for entry in _query:
>>        # dict in list
>>        for item in archive_list:
>>            # year of entry
>>            _year = entry.pub_date.year
>>            if _year in item:
>>                _year_list = item[_year]
>>                for _dict_month in _year_list:
>>                    _tmp_month = entry.pub_date.strftime("%B")
>>                    if _tmp_month in _dict_month:
>>                        _dict_month[_tmp_month].append(entry)
>>
>>    return archive_list
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
> Instant first comment, after just a glance at your code, Can't you use
> an int for the year and months, rather than strings?  (This is a
> principle of the MVC (Model, View, Controller) structure:  you keep
> your model (The actual data) as easy to handle as possible, and using
> ints as keys makes conversion between the query and the output and
> sorting easier.  Conversion from {2009:{12: ...}} to "December 2009:
> ..." should be handled by the view: either the GUI, or immediatly
> before it's displayed.  (This may be a point of contention, it could
> be argued that the view should only display it, and all manipulation
> should be handled by the controller level.)
>
> This isn't a case for recursion:  recursion is best used when you have
> one thing containing a container, which contains a container, which
> contains another, which contains another, etc, to (usually) unknown
> depths.  (Yes, there are numeric uses for recursion, but they are
> usually best replaced with an iterative approach).
>
> It would be more pythonic to format your output as a dict of dicts (of
> lists), rather than lists of single-item dicts of lists of single-item
> dicts:
> archive_list = \
> {2009: {12: ['entry1', 'entry2'],
>             1: ['entry3', 'entry4']}
>            },
> '2008': {1: ['entry5', 'entry6']}
> }
>
> You loose the ordering, but that isn't a problem when your keys have
> implicit ordering.
>
> Look up collections.defaultdict.  To generate this structure, you can
> use dict as the factory function with defaultdict:
>
> d = collections.defaultdict(dict)
>
> or, more usefully,
> d = collections.defaultdict(functools.partial(collections.defaultdict, list))
> This will give a defaultdict of defaultdicts of lists.  (The
> functools.partial just returns a callable, which is what defaultdict
> needs to create its default value.
>
> Then, as you iterate through the report, you simply assign to the dict
> as follows:
>
> for entry in month_record:
>    d[year][month].append(entry)
>
> It avoids all the messing around with tmp_year_list and so on.
>
> Another way, using only builtins, is to use dict.setdefault():
>
> d = {}
> d.setdefault(year, {}).setdefault(month, [])
>
> This is essentially what the defaultdict does for you.
>
> Doing it like this will give you the advantage of being easy to read:
> I'm struggling to follow what your code does at the moment:  you've
> got far too many levels of nesting and temporary variables.
>
> I don't know the structure of your query result, but using pseudo-functions:
>
> import collections, functools
>
> def create_archive_list(query):
>    d = collections.defaultdict(functools.partial(collections.defaultdict,
> list))
>    for entry in query: #iterate over blog posts, rather than years,
> months etc - therefore using a single pass
>        d[entry.year()][entry.month()].append(entry.data())
>    return d
>
> And finally, a little utility function, which DOES use recursion:
>
> def mapToDict(d):
>    """"recursively convert defaultdicts into regular dicts"""
>    d = dict(d)
>    d.update((k, map_to_dict(v)) for k,v in d.items() if isinstance(v,
> collections.defaultdict))
>    return d
>
>
> This function is more for personal comfort than regular use (although
> it does clean up debugging output):  it's poor programming if code
> that works with a regular dict fails if given a defaultdict.  If you
> DO discover any, I highly recommend reporting it as a bug.
>
> Like I said, I've not seen your data structure, so I'm making some big
> guesses here, but this should give you enough to look into towards
> improving your code.
>
> --
> Rich "Roadie Rich" Lovely
>
> There are 10 types of people in the world: those who know binary,
> those who do not, and those who are off by one.
>