random writing access to a file in Python

Claudio Grondi claudio.grondi at freenet.de
Sat Aug 26 18:19:14 EDT 2006


Paul Rubin wrote:
> Claudio Grondi <claudio.grondi at freenet.de> writes:
> 
>>Is there a ready to use (free, best Open Source) tool able to sort
>>lines (each line appr. 20 bytes long) of a XXX GByte large text file
>>(i.e. in place) taking full advantage of available memory to speed up
>>the process as much as possible?
> 
> 
> Try the standard Unix/Linux sort utility.  Use the --buffer-size=SIZE
> to tell it how much memory to use.
I am on Windows and it seems, that Windows XP SP2 'sort' can work with 
the file, but not without a temporary file and space for the resulting 
file,  so triple of space of the file to sort must be provided.
Windows XP 'sort' uses constantly appr. 300 MByte of memory and can't 
use 100% of CPU all the time, probably due to I/O operations via USB (25 
MByte/s experienced top data transfer speed).
I can't tell yet if it succeeded as the sorting of the appr. 80 GByte 
file with fixed length records of 20 bytes is still in progress (for 
eleven CPU time hours / 18 daytime hours).
I am not sure if own programming would help in my case to be much faster 
than the systems own sort (I haven't tried yet to set the size of memory 
to use in the options to e.g 1.5 GByte as the sort help tells it is 
better not to specify it). My machine is a Pentium 4, 2.8 GHz with 2.0 
GByte RAM.
I would be glad to hear if the time required for sorting I currently 
experience is as expected for such kind of task or is there still much 
space for improvement?

Claudio Grondi



More information about the Python-list mailing list