Large File Parsing

Tim Roberts timr at probo.com
Tue Jun 17 06:49:39 CEST 2003


Robert S Shaffer <r.shaffer9 at verizon.net> wrote:
>
>I have upto a 3 million record file to parse, remove duplicates and
>sort by size then numeric value. Is this the best way to do this in
>python.

In my opinion, no; the best way would be to use a simple chain of command
filters:

  cut -f 0 -d , inputfile | sort -n | uniq > outputfile

There is no need to reinvent the wheel when perfectly good solutions exist.

even if you are using Windows, you can download either Cygwin or the
UnxUtils, which provides all of these tools.
-- 
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.




More information about the Python-list mailing list