looking for speed-up ideas

Tue Feb 4 08:09:09 EST 2003

"Ram Bhamidipaty" <ramb at sonic.net> wrote:
>
> My script reads the file and prints out the 200 largest files.
[...]
> I am willing to write C code, but I would first like to get the python
> code faster.

A partial sort is potentially a lot faster (lower complexity).  That
is, just sort until the first 200 entries are right, then stop.
Enclosed below is Python source code.  It's slow partly because it
does redundant comparisons and list copying, you could rewrite using
indices and in-place sorting (quicksort style), and partly because
we're comparing to the builtin sort, and the builtin sort is just so
damn good!

Using C++'s std::partial_sort[_copy] would be fast though.  Either by
loading your data into a std::vector first, or, if your C++ skills are
up to it, by writing an iterator class to modify a PyList in-place.

- Anders

def sort(sequence, cmpfunc=None):
    """sort a sequence, returning a new list; if given, cmpfunc(x,y) -> -1, 0,
1"""
    sorted = list(sequence)
    if cmpfunc is None:
        sorted.sort()
    else:
        sorted.sort(cmpfunc)
    return sorted

def first(lis, count):
    """
    Same as sort(lis)[:count], only doesn't sort more than necessary.
    This primitive implementation is not necessarily faster than
    sort(lis)[:count].
    """
    if len(lis) <= count:
        return sort(lis)
    else:
        (pivot,others) = lis[0], lis[1:]
        lower = [elem for elem in others if elem < pivot]
        higher = [elem for elem in others if not (elem < pivot)]
        if len(lower) >= count:
            return first(lower, count)
        else:
            return sort(lower)+[pivot]+first(higher, count-len(lower)-1)