[Python-porting] bytes != str ... a few notes

John Machin sjmachin at lexicon.net
Mon Dec 15 12:03:35 CET 2008


In Python 3, b'abc' != 'abc' (and rightly so). I have scribbled down 
some notes (below) which may be useful to others. Is there a "porting 
tips" wiki or other public place for posting this kind of thing?

A couple of minor points on other topics:

1. It would have been nice if 3.x ord(a_byte) just quietly returned 
a_byte; porters would have not needed to change anything.

2. bytes.join() and bytearray.join() exist and work (on an iterable 
which may contain a mixture of bytes and bytearray objects) just as 
extrapolation from str.join would lead you to expect, but the help needs 
a little fixing and there's no mention of them in the Library Reference 
Manual. I've raised a bug report: http://bugs.python.org/issue4669

Cheers,
John

=== Comparing bytes objects with str objects ===

In Python 3.x, a bytes object will never ever compare equal to a str object.

Porter's problem (example):

data = open(fpath, "rb").read(8)
OLE2_SIG = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
if data == OLE2_SIG:
     # This point is unreachable in 3.x, because data is bytes (has been
     # read from a file opened in binary mode) and OLE_2SIG is a str
     # object.

Solution for "simple" porting:
OLE2_SIG = b"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"

A tentative solution when maintaining one codebase which runs as is on 
2.x and from which 3.x code is generated:

# ... excerpt from "include file"
if python_version >= (3, 0):
     def STR2BYTES(x, encoding='latin1'):
         return x.encode(encoding)
else:
     def STR2BYTES(x):
         return x

# ... changed code
OLE2_SIG = STR2BYTES("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1")

How to find cases of this problem:
1. Can't be detected by Python 2.6 -3 option.
2. Can't be detected/handled by 2to3 script.
3. Is detected by Python 3.x -b (warn) and -bb (error) command-line 
options to check for bytes/str [in]equality comparisons. [Aside: these 
options are documented in the expected place but not mentioned in the 
porting notes 
(http://docs.python.org/dev/py3k/whatsnew/3.0.html#porting-to-python-3-0)]
4. Should be detected by your tests but the point where the test fails 
may be some distance from the actual comparison.
5. Search your code for bytesy things like \x and \0.
6. Read your code (but turn 2.x mindset off because if you don't, the 
code will look just fine!).

=== end of screed ===


More information about the Python-porting mailing list