FileInput too slow
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Jan 4 20:01:39 EST 2010
En Mon, 04 Jan 2010 19:35:02 -0300, wiso <gtu2003 at alice.it> escribió:
> I'm trying the fileinput module, and I like it, but I don't understand
> why
> it's so slow... look:
>
> from time import time
> from fileinput import FileInput
>
> file = ['r1_200907.log', 'r1_200908.log', 'r1_200909.log',
> 'r1_200910.log',
> 'r1_200911.log']
>
> def f1():
> n = 0
> for f in file:
> print "new file: %s" % f
> ff = open(f)
> for line in ff:
> n += 1
> ff.close()
> return n
>
> def f2():
> f = FileInput(file)
> for line in f:
> if f.isfirstline(): print "new file: %s" % f.filename()
> return f.lineno()
>
> def f3(): # f2 simpler
> f = FileInput(file)
> for line in f:
> pass
> return f.lineno()
Yes, the fileinput module is A LOT slower than normal file processing.
You may use itertools.chain instead:
def f4():
f = itertools.chain.from_iterable(open(fn) for fn in file)
n = 0
for line in f:
n += 1
return n
I get similar timings as f1() above.
Known major issues of this "poor man's" implementation:
- no lineno/filelineno/isfirstline attributes
- close() is implicit
- only for reading; inplace and backup don't work
--
Gabriel Genellina
More information about the Python-list
mailing list