python and very large data sets???

Chris Liechti cliechti at gmx.net
Wed Apr 24 16:24:10 EDT 2002


Fernando Pérez <fperez528 at yahoo.com> wrote in
news:aa6q9k$sf6$1 at peabody.colorado.edu: 
> Rad wrote:
> 
>> I am preparing myself to work on extracting data from 4 text files
>> (fixed width format) which combined size is about 80GB.  Considering
> 
> May I suggest that you also pick up some basic (4 hrs worth) grep/awk 
> knowledge? Sometimes I find that for quickly extracting a few tagged
> data fields from a large file, awk is much faster than python. I'll
> make the awk call in my python code and then keep operating for the
> more complex stuff in python.

i agree with that. i use awk to extract records from 4GB and larger files 
an then do the post processing in python.
it's hard to beat awk's speed but such a pre-processing is only useful if 
you can reduce the dataset. if you can only drop away 20% with post- 
processing it might have been faster to process all data with python.

note that awk is not very suitable for binary file but very powerful on 
text files. you should realy get a data sample file to play with.

chris


-- 
Chris <cliechti at gmx.net>




More information about the Python-list mailing list