"groupby" is brilliant!

Alex Martelli aleax at mac.com
Wed Jun 14 21:37:22 EDT 2006


James Stroud <jstroud at ucla.edu> wrote:

> Alex Martelli wrote:
> > James Stroud <jstroud at ucla.edu> wrote:
> >    ...
> > 
> >>def doit(rows, doers, i=0):
> >>   for r, alist in groupby(rows, itemgetter(i)):
> >>     if len(doers) > 1:
> >>       doit(alist, doers[1:], i+1)
> >>     doers[0](r)
> > 
> > 
> > Isn't this making N useless slices (thus copies, for most kinds of
> > sequences) for a doers of length N?  Since you're passing i anyway, it
> > seems to me that:
> > 
> > def doit(rows, doers, i=0):
> >     for r, alist in groupby(rows, itemgetter(i)):
> >       if len(doers) > i+1:
> >          doit(alist, doers, i+1)
> >       doers[i](r)
> > 
> > is equivalent to your code, but avoids these slices (thus copies).
> > 
> > 
> > Alex
> 
> Actually, I remember why I wrote it that way--because the itemgetter 
> argument may not start at zero (admitting that I haven't yet played with
> itemgetter at all). Maybe pop(0) is better than a copy?
> 
> 
> def doit(rows, doers, i=0):
>    for r, alist in groupby(rows, itemgetter(i)):
>      doer = doers.pop(0)
>      if doers:                  # empty list tests as False
>         doit(alist, doers, i+1)
>      doer(r)

No, doers.pop(0) is O(N), just like the slicing doers[1:].

To support the indexing into itemgetter possibly being different from
that into doers (under the hypothesis that efficiency matters here, of
course!-) I'd rather do something like:

def doit(rows, doers, i=0):
    L = len(doers)
    def _aux(rows, j):
        if L <= j: return
        doer = doers[j]
        for r, alist in groupby(rows, itemgetter(i+j)):
            _aux(alist, j+1)
            doer(r)
    _aux(rows, 0)

haven't tested this, but it seems a straightforward semantics-preserving
transformation -- the hoisting out of the recursion of the computation
of len(), and out of the loop of the termination-test and indexing,
being trivial optimizations (which I'm not averse to using, even though
trivial, because I think they make the code clearer.

Whether all this is worth it, or not, is in a sense besides the point --
I like this anyway, even just as an example of how the best way to
introduce recursion may often be, not in the "official" function (the
one that gets called by client code), but in a "private" auxiliary
function inside the "official" one.


Alex



More information about the Python-list mailing list