readlines() and "binary" files

Wed Sep 25 12:06:10 EDT 2002

> I think readlines() is just a shortcut for a very common task. Since your
> task isn't quite as common, I think it would be a better idea to use
> read() to read the whole thing, splitting the lines up by the 0x0d 0x0a
> pair (CR NL).

Good suggestion, Jeff, thanks. I did not not realise that even data
read from a 'binary' stream could be split()'ed.
Maybe someone else will have a use for this, and I wanted to correct
what I had said: it's extra newlines and not carriage returns that
the data contains.

        f = open('testdata.csv','rb')
        # swallow everything in memory
        block = f.read()
        lines = block.split( "\r\n" )
        f.close()
        # free our in-core file copy asap
        del block

        for line in lines:
            # get rid of the odd extra NL
            line = line.replace( "\n", "_" )
            fields = line.split(';')
            print len(fields)    # now constant

My biggest data file is about 8 megs, so reading it in one swoop
is doable. Still I only actually need to look at about 120 bytes
at a time, so that's rather overkill. If anyone can think of a more
economical way of doing it (short of defining my own iterator), I'd
be interested.

Thanks.