sorting 1172026 entries

Dan Stromberg drsalists at gmail.com
Sun May 6 18:30:25 EDT 2012


How much physical RAM (not the virtual memory, but the physical memory)
does your machine have available?  We know the number of elements in your
dataset, but how big are the individual elements?  If a sort is never
completing, you're probably swapping.

list.sort() is preferrable to sorted(list), if you need to conserve memory.

A self-sorting datastructure like a B Tree or whatever is rarely faster
than using a good sorting algorithm, assuming that you only need to sort
the data once per program invocation.  If you're sorting the data in a
loop, then yes, a self-sorting datastructure is most likely a much better
idea.  I've started a page about python trees and heaps at
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/ -
but again, unless you're sorting in a loop, you're probably better off with
a good sorting algorithm, and Python's built-in Timsort is excellent for
all-in-RAM sorting.

If you have too much data to sort in RAM all at once, it's best to sort
sublists that -do- fit in memory, using something like list_.sort() (that's
Timsort by another name in CPython), and then write the sorted sublists to
disk for subsequent merging using the merge step of a mergesort.

If you just need something quick-to-code, GNU sort(1) is very highly
optimized (for a text-based sort), and will sort datasets that are larger
than your physical memory efficiently using an algorithm similar to (not
identical) what I described in the previous paragraph.  It's /usr/bin/sort
on most Linux systems, and is available for a large number of other
operating systems including windows via http://cygwin.com/ or
https://github.com/bmatzelle/gow/wiki/

On Sun, May 6, 2012 at 8:57 AM, J. Mwebaze <jmwebaze at gmail.com> wrote:

> I have several lists with approx 1172026 entries. I have been trying to
> sort the records, but have failed.. I tried lists.sort() i also trired
> sorted python's inbuilt method. This has been running for weeks.
>
> Any one knows of method that can handle such lists.
>
> cheers
>
>
>
> --
> *Mob UG: +256 (0) 70 1735800 | NL +31 (0) 6 852 841 38 | Gtalk: jmwebaze
> |  skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze<http://www.astro.rug.nl/%7Ejmwebaze>
>
> /* Life runs on code */*
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120506/46183a6d/attachment.html>


More information about the Python-list mailing list