Fastest way to read / procsess / write text?

Jörg Baumann joerg.baumann at stud.informatik.uni-erlangen.de
Wed Jun 5 06:55:37 EDT 2002


Brent Miller wrote:

...
> This works beautifully most of the time (it's super fast), except when I
> pipe large files (>50megs) to it and then it usually dies half way
> through complaining of memory errors because it ran out of ram.
...

Since you alter only texte inside single lines, you can process the file 
line by line, instead of loading it into memory as a whole.
(See Python docs for xreadlines(), xranges, ...)

#!/usr/bin/python2

import sys, re

try:
  outfilename = sys.argv[1]
except:
  raise "usage: %s <output file>"%sys.argv[0]

infile = sys.stdin
outfile = open(outfilename, "w")

for line in infile.xreadlines():
  outfile.write(re.sub('[.](?=\d\d\d\d\d\d)|[.](?=\w+[ 
][<|>])|[.](?=\w+[:])|[ ](?!0x)', '\t', line))





More information about the Python-list mailing list