open() in binary vs. text mode
hal_wine at yahoo.com
Fri Mar 21 04:03:15 CET 2003
Bob Roberts wrote:
> I just finished tracking down a cross-platorm bug. The problem was
> that I didn't open() a file in binary ("rb") mode. What exactly does
> the binary flag do on windows? What is it's purpose?
As others explained, sometimes the binary flag causes different
behavior w.r.t. line endings. However, it can (should) always
serve as documentation that the file is not a text file native to
the current platform. (I use the rule "all files are binary
unless provably text".)
Since you mention cross platform work, let me clarify some
terminology from one of the other posts. It's a bit pendantic,
but a distinction that will never steer you wrong.
'\n', '\r' and the like are not characters. They are escape
sequences (meta characters) whose bit pattern is defined by the C
compiler used to generate python. (Same issue in Perl, Tcl etc.)
If you find yourself writing code that cares about the bit value
representation of an escape character, you are dealing with a
binary file, and should not use escape characters.
If you need to refer to specific bit patterns in your strings,
use a constant you define, e.g.
CRLF = '\x0d\x0a'
--Hal (who learned this the hard way years ago on a platform
where different compiler vendors had different ideas of the
internal bit pattern of \n ...)
P.S. if you know the file is a text file (but perhaps from
another platform), you can normalize the input string thusly:
contents = open( "foo", "rb").read()
contents.replace( '\x0d\x0a', '\n' )
contents.replace( '\r', '\n' )
Now contents looks like a native platform text string. (I _think_
all the platforms that used LFCR as a separator are long dead now...)
More information about the Python-list