open() in binary vs. text mode

Hal Wine hal_wine at yahoo.com
Thu Mar 20 22:03:15 EST 2003


Bob Roberts wrote:
> I just finished tracking down a cross-platorm bug.  The problem was
> that I didn't open() a file in binary ("rb") mode.  What exactly does
> the binary flag do on windows?  What is it's purpose?

As others explained, sometimes the binary flag causes different 
behavior w.r.t. line endings. However, it can (should) always 
serve as documentation that the file is not a text file native to 
the current platform. (I use the rule "all files are binary 
unless provably text".)

Since you mention cross platform work, let me clarify some 
terminology from one of the other posts. It's a bit pendantic, 
but a distinction that will never steer you wrong.

'\n', '\r' and the like are not characters. They are escape 
sequences (meta characters) whose bit pattern is defined by the C 
compiler used to generate python. (Same issue in Perl, Tcl etc.)

If you find yourself writing code that cares about the bit value 
representation of an escape character, you are dealing with a 
binary file, and should not use escape characters.

If you need to refer to specific bit patterns in your strings, 
use a constant you define, e.g.
	CRLF = '\x0d\x0a'

--Hal (who learned this the hard way years ago on a platform 
where different compiler vendors had different ideas of the 
internal bit pattern of \n ...)

P.S. if you know the file is a text file (but perhaps from 
another platform), you can normalize the input string thusly:
	contents = open( "foo", "rb").read()
	contents.replace( '\x0d\x0a', '\n' )
	contents.replace( '\r', '\n' )
Now contents looks like a native platform text string. (I _think_ 
all the platforms that used LFCR as a separator are long dead now...)





More information about the Python-list mailing list