Matching Directory Names and Grouping Them

Neil Cerutti horpner at yahoo.com
Fri Jan 12 17:02:28 CET 2007


On 2007-01-11, J <wilder.usenet at gmail.com> wrote:
> Steve-
>
> Thanks for the reply. I think what I'm trying to say by similar
> is pattern matching. Essentially, walking through a directory
> tree starting at a specified root folder, and returning a list
> of all folders that matches a pattern, in this case, a folder
> name containing a four digit number representing year and a
> subdirectory name containing a two digit number representing a
> month. The matches are grouped together and written into a text
> file. I hope this helps.

Here's a solution using itertools.groupby, just because this is
the first programming problem I've seen that seemed to call for
it. Hooray!

from itertools import groupby

def print_by_date(dirs):
    r""" Group a directory list according to date codes.

    >>> data = [
    ...     "<root>/Input2/2002/03/",
    ...     "<root>/Input1/2001/01/",
    ...     "<root>/Input3/2005/05/",
    ...     "<root>/Input3/2001/01/",
    ...     "<root>/Input1/2002/03/",
    ...     "<root>/Input3/2005/12/",
    ...     "<root>/Input2/2001/01/",
    ...     "<root>/Input3/2002/03/",
    ...     "<root>/Input2/2005/05/",
    ...     "<root>/Input1/2005/12/"]
    >>> print_by_date(data)
    <root>/Input1/2001/01/
    <root>/Input2/2001/01/
    <root>/Input3/2001/01/
    <BLANKLINE>
    <root>/Input1/2002/03/
    <root>/Input2/2002/03/
    <root>/Input3/2002/03/
    <BLANKLINE>
    <root>/Input2/2005/05/
    <root>/Input3/2005/05/
    <BLANKLINE>
    <root>/Input1/2005/12/
    <root>/Input3/2005/12/
    <BLANKLINE>

    """
    def date_key(path):
        return path[-7:]
    groups = [list(g) for _,g in groupby(sorted(dirs, key=date_key), date_key)]
    for g in groups:
        print '\n'.join(path for path in sorted(g))
        print

if __name__ == "__main__":
    import doctest
    doctest.testmod()

I really wanted nested join calls for the output, to suppress
that trailing blank line, but I kept getting confused and
couldn't sort it out.

It would better to use the os.path module, but I couldn't find
the function in there lets me pull out path tails.

I didn't filter out stuff that didn't match the date path
convention you used.

-- 
Neil Cerutti



More information about the Python-list mailing list