readlines() and "binary" files
Justin
e_fax_t at hotmail._ZAPME_.com
Wed Sep 25 12:06:10 EDT 2002
> I think readlines() is just a shortcut for a very common task. Since your
> task isn't quite as common, I think it would be a better idea to use
> read() to read the whole thing, splitting the lines up by the 0x0d 0x0a
> pair (CR NL).
Good suggestion, Jeff, thanks. I did not not realise that even data
read from a 'binary' stream could be split()'ed.
Maybe someone else will have a use for this, and I wanted to correct
what I had said: it's extra newlines and not carriage returns that
the data contains.
f = open('testdata.csv','rb')
# swallow everything in memory
block = f.read()
lines = block.split( "\r\n" )
f.close()
# free our in-core file copy asap
del block
for line in lines:
# get rid of the odd extra NL
line = line.replace( "\n", "_" )
fields = line.split(';')
print len(fields) # now constant
My biggest data file is about 8 megs, so reading it in one swoop
is doable. Still I only actually need to look at about 120 bytes
at a time, so that's rather overkill. If anyone can think of a more
economical way of doing it (short of defining my own iterator), I'd
be interested.
Thanks.
More information about the Python-list
mailing list