DSU pattern (was Re: Trouble sorting lists (unicode/locale related?))

Peter Otten __peter__ at web.de
Tue Sep 23 04:27:09 EDT 2003


Duncan Booth wrote:

> Note that if anyone proposes this seriously, it should generate a 3-tuple
> (mapping(item), index, item) rather than the 2-tuple you suggest.
> 
> This is because the mapping function could reasonably be used to impose an
> ordering on objects that have no natural order, so you need to be sure
> that the comparison never falls through the the original object even where
> the mapping compares equal. It also has the side effect of making the sort
> stable (although if stability is a goal you might want another option to
> reverse the sort which would use '-index' as the second element and call
> .reverse() on the result).

So my demo implementation was faulty :-(
Let me try again:

    def sort(self, cmpfunc=None, mapfunc=None):
        if mapfunc:
            list.sort(self, lambda x, y: cmp(mapfunc(x), mapfunc(y)))
        else:
            list.sort(self, cmpfunc)

Seriously, if sort() were to be thus extended, the dsu pattern would rather
act as a performance enhancement under the hood. I prefer plain C 

struct {
    PyObject *decorator;
    int index; /* iff stability is not guaranteed by the sort algorithm */
    PyObject *data;
}

over n-tuples and would hope to reuse the sorting infrastructure already
there in listobject.c to compare only the decorators. I am not familiar
with the C source, so be gentle if that is not feasible.

> FWIW, I think something like this belongs in the standard library rather
> than as a method on lists or a new builtin.
 
If you think of the comparison as a two-step process, 

(1) extract or calculate an attribute
(2) wrap it into a comparison function

consider the following use cases:

(a) lambda x, y: cmp(y, x)
(b) lambda x, y: cmp(x.attr.lower(), y.attr.lower())

All code following the latter pattern could be rewritten (more clearly, I
think) as

alist.sort(mapfunc=lambda x: attr.lower())

and would automatically benefit from dsu (which could even be turned off
transparently for the client code for very large lists, where the memory
footprint becomes a problem).

The litmus test whether to put it into a utility function or into an already
existing method that needs only a minor and completely backwards-compatible
interface change would then be:

Are (b)-style comparison functions nearly as or more common than (a)-style
functions.

Peter

PS: You and Alex Martelli dismissed my somewhat unfocused idea faster than I
could sort it out. I hope you will find the above useful, though.





More information about the Python-list mailing list