sorting a file

Beema shafreen beema.shafreen at gmail.com
Sun Jun 15 08:38:23 CEST 2008


Thanks lot for your valuable suggestions

On Sun, Jun 15, 2008 at 4:04 AM, Dennis Lee Bieber <wlfraed at ix.netcom.com>
wrote:

> On Sat, 14 Jun 2008 12:45:47 +0530, "Beema shafreen"
> <beema.shafreen at gmail.com> declaimed the following in
> gmane.comp.python.general:
>
>        Strange: I don't recall seeing this on comp.lang.py, just the first
> responder; and a search on message ID only found it on gmane...
>
> > Hi all,
> >
> > I have a file with three columns  i need to sort the file with respect to
> > the third column. How do I do it uisng python. I used Linux command to do
> > this. Sort but i not able to do it ?
> > can any body ssuggest me
>
> Question 1:     Will the file fit completely within the memory of a running
> Python program?
>
> Question 2:     How are the columns defined? Fixed width, known in advance;
> tab separated; comma separated.
>
> If #1 is true, I'd read the file into a list of tuples/sublists (if line
> is fixed width columns, read line, manually split on column widths; if
> TSV or CSV use the proper options with the CSV module to read the file).
> Define a sort key function to extract the key column and use the
> built-in list sort method
>
>        data.sort(key=lambda x : x[2]) #warning, I'm not skilled at lambda
>
> Actually, if text sort order (not numeric value order) is okay, and the
> lines are fixed width columns, no need to manually split the columns
> into tuples; just read all lines into a list and define a key function
> that picks out the columns needed
>
>        data.sort(key=lambda x : x[colstart:colend])
>
>
> If #1 if FALSE (too big for memory) you will need to create a sort-merge
> procedure in which you read n-lines of the file; sort them, write to
> temporary file; alternating among 2+ temporary files keeping the same
> n-lines (except for the last packet). Then merge the 2+ temporaries over
> the n-lines in the batch to a new temporary file; after the first n
> lines have been merged (giving n*2+ lines in the batch) switch to
> another temporary file for the next batch.... When all original batches
> are merged, repeat the merge using batches of size n*2+... Repeat until
> only one temporary file is left (ie, only one long merge batch is
> written).
>
>        Or figure out how to call whatever system sort command is available
> with whatever parameters are needed -- after all, why reinvent the wheel
> if you can reach outside the snake and grab that is already in the snake
> pit ("outside the snake" => os.system(...); "snake pit" => the OS
> environment). Even WinXP has a command line sort command; as long as you
> don't need a multikey sort it can handle the simple text record sorting
> with limitations on memory size to use.
>
> --
>        Wulfraed        Dennis Lee Bieber               KD6MOG
>        wlfraed at ix.netcom.com           wulfraed at bestiaria.com
>                HTTP://wlfraed.home.netcom.com/
>        (Bestiaria Support Staff:               web-asst at bestiaria.com)
>                HTTP://www.bestiaria.com/
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Beema Shafreen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080615/73bbb47b/attachment.html>


More information about the Python-list mailing list