How to make this unpickling/sorting demo faster?

Thu Apr 17 17:10:14 EDT 2008

Steve Bergman <sbergman27 at gmail.com> writes:
> Anything written in a language that is > 20x slower (Perl, Python,
> PHP) than C/C++ should be instantly rejected by users on those grounds
> alone.

Well, if you time it starting from when you sit down at the computer
and start programming, til when the sorted array is output, Python
might be 20x faster than C/C++ and 100x faster than assembler.

> I've challenged someone to beat  the snippet of code below in C, C++,
> or assembler, for reading in one million pairs of random floats and
> sorting them by the second member of the pair.  I'm not a master
> Python programmer.  Is there anything I could do to make this even
> faster than it is?

1. Turn off the cyclic garbage collector during the operation since
you will have no cyclic garbage.

2. See if there is a way to read the array directly (using something
like the struct module or ctypes) rather than a pickle.

3. Use psyco and partition the array into several smaller ones with a
quicksort-like partitioning step, then sort the smaller arrays in
parallel using multiple processes, if you have a multicore CPU.

4. Write your sorting routine in C or assembler and call it through
the C API.  If the sorting step is a small part of a large program and
the sort is using a lot of cpu, this is a good approach since for the
other parts of the program you still get the safety and productivity
gain of Python.

> Also, if I try to write the resulting list of tuples back out to a
> gdbm file, 

I don't understand what you're doing with gdbm.  Just use a binary
file.