[Python-porting] bytes != str ... a few notes
John Machin
sjmachin at lexicon.net
Mon Dec 15 12:03:35 CET 2008
In Python 3, b'abc' != 'abc' (and rightly so). I have scribbled down
some notes (below) which may be useful to others. Is there a "porting
tips" wiki or other public place for posting this kind of thing?
A couple of minor points on other topics:
1. It would have been nice if 3.x ord(a_byte) just quietly returned
a_byte; porters would have not needed to change anything.
2. bytes.join() and bytearray.join() exist and work (on an iterable
which may contain a mixture of bytes and bytearray objects) just as
extrapolation from str.join would lead you to expect, but the help needs
a little fixing and there's no mention of them in the Library Reference
Manual. I've raised a bug report: http://bugs.python.org/issue4669
Cheers,
John
=== Comparing bytes objects with str objects ===
In Python 3.x, a bytes object will never ever compare equal to a str object.
Porter's problem (example):
data = open(fpath, "rb").read(8)
OLE2_SIG = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
if data == OLE2_SIG:
# This point is unreachable in 3.x, because data is bytes (has been
# read from a file opened in binary mode) and OLE_2SIG is a str
# object.
Solution for "simple" porting:
OLE2_SIG = b"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
A tentative solution when maintaining one codebase which runs as is on
2.x and from which 3.x code is generated:
# ... excerpt from "include file"
if python_version >= (3, 0):
def STR2BYTES(x, encoding='latin1'):
return x.encode(encoding)
else:
def STR2BYTES(x):
return x
# ... changed code
OLE2_SIG = STR2BYTES("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1")
How to find cases of this problem:
1. Can't be detected by Python 2.6 -3 option.
2. Can't be detected/handled by 2to3 script.
3. Is detected by Python 3.x -b (warn) and -bb (error) command-line
options to check for bytes/str [in]equality comparisons. [Aside: these
options are documented in the expected place but not mentioned in the
porting notes
(http://docs.python.org/dev/py3k/whatsnew/3.0.html#porting-to-python-3-0)]
4. Should be detected by your tests but the point where the test fails
may be some distance from the actual comparison.
5. Search your code for bytesy things like \x and \0.
6. Read your code (but turn 2.x mindset off because if you don't, the
code will look just fine!).
=== end of screed ===
More information about the Python-porting
mailing list