readlines() and "binary" files
jdavis at empires.org
Tue Sep 24 21:37:09 EDT 2002
I think readlines() is just a shortcut for a very common task. Since your
task isn't quite as common, I think it would be a better idea to use
read() to read the whole thing, splitting the lines up by the 0x0d 0x0a
pair (CR NL).
If it's a really large amount of data you can try to process it in chunks.
> I have excel data with occasional multi-line fields,
> which when dumped to CSV translates to embedded CR's
> within a line, whereas the records/lines themselves
> are delimited by the CR+NL pair (this is MS-land).
> What I'd like to do is read those files and split every
> line apart on the semi-colon field separator. But it
> seems that whether the file is opened as text or not,
> (x)readlines() still considers the lone CR as a line
> delimiter and so not all my lines end up with the same
> number of fields as they should. Is there a way to handle
> this, or is readlines just not meant to work with anything
> but proper text files?
> f = open('testdata.csv','rb')
> for line in f.xreadlines():
> fields = line.split(';')
> print len(fields) # should always be the same value
More information about the Python-list