Sort Big File Help

Jonathan Gardner jgardner at jonathangardner.net
Wed Mar 3 17:16:24 EST 2010


On Wed, Mar 3, 2010 at 8:19 AM, John Filben <johnfilben at yahoo.com> wrote:
> I am new to Python but have used many other (mostly dead) languages in the
> past.  I want to be able to process *.txt and *.csv files.  I can now read
> that and then change them as needed – mostly just take a column and do some
> if-then to create a new variable.  My problem is sorting these files:
>
> 1.)    How do I sort file1.txt by position and write out file1_sorted.txt;
> for example, if all the records are 100 bytes long and there is a three
> digit id in the position 0-2; here would be some sample data:
>
> a.       001JohnFilben……
>
> b.      002Joe  Smith…..
>
> 2.)    How do I sort file1.csv by column name; for example, if all the
> records have three column headings, “id”, “first_name”, “last_name”;  here
> would be some sample data:
>
> a.       Id, first_name,last_name
>
> b.      001,John,Filben
>
> c.       002,Joe, Smith
>
> 3.)    What about if I have millions of records and I am processing on a
> laptop with a large external drive – basically, are there space
> considerations? What are the work arounds.
>
> Any help would be appreciated. Thank you.
>

You may also want to look at the GNU tools "sort" and "cut". If your
job is to process files, I'd recommend tools designed to process files
for the task.

-- 
Jonathan Gardner
jgardner at jonathangardner.net



More information about the Python-list mailing list