[Tutor] Python 2 & 3 and unittest
Steven D'Aprano
steve at pearwood.info
Thu Sep 5 01:11:50 CEST 2013
On Wed, Sep 04, 2013 at 06:30:12AM -0700, Albert-Jan Roskam wrote:
> Hi,
>
> I am trying to make my app work in Python 2.7 and Python 3.3 (one
> codebase) and I might later also try to make it work on Python 2.6 and
> Python 3.2 (if I am not too fed up with it ;-). I was very happy to
> notice that the 'b' prefix for bytes objects is also supported for
> byte strings in Python 2.7. Likewise, I just found out that the "u"
> has been re-introduced in Python 3.3 (I believe this is the first
> Python 3 version where this re-appears).
>
> I am now cursing myself for having used doctest for my tests.
Well that's just silly :-)
Doc tests and unit tests have completely different purposes, and
besides, in general doc tests are easier to make version independent.
Many of your doc tests will still continue to work, and those that don't
can almost always be adapted to be cross-platform.
> So I am planning to rewrite everything in unittest.
> Is the try-except block below the best way to make this test work in
> Python 2.6 through 3.3?
>
> import unitttest
> import blah # my package
>
>
> class test_blah(unittest.TestCase):
> def test_someTest(self):
> try:
> expected = [u"lalala", 1] # Python 2.6>= & Python 3.3>=
> except SyntaxError:
> expected = ["lalala", 1] # Python 3.0, 3.1, 3.2
That cannot work. try...except catches *run time* exceptions.
SyntaxError occurs at *compile time*, before the try...except gets a
chance to run.
Unfortunately, there is no good way to write version-independent code
involving strings across Python 2.x and 3.x. If you just support 3.3 and
better, it is simple, but otherwise you're stuck with various nasty work
arounds, none of which are ideal.
Probably the least worst for your purposes is to create a helper
function in your unit test:
if version < '3':
def u(astr):
return unicode(astr)
else:
def u(astr):
return astr
Then, everywhere you want a Unicode string, use:
u("something")
The two problems with this are:
1) It is slightly slower, but for testing purposes that doesn't really
matter; and
2) You cannot write non-ASCII literals in your strings. Or at least not
safely.
> Another, though related question. We have Python 2.7 in the office and
> eventually we will move to some Python3 version. The code does not
> generally need to remain Python2 compatible. What is the best
> strategy: [a] write forward compatible code when using Python 2.7.
> (using a 'b' prefix for byte strings, parentheses for the print
> *statement*, sys.exc_info()[1] for error messages, etc.). [b] totally
> rely on 2to3 script and don't make the Python2 code less reabable and
> less idiomatic before the upgrade.
Option a, but not the way you say it. Start by putting
from __future__ import division, print_function
at the top of your 2.x code. I don't believe there is a way to make
string literals unicode, you just have to get used to writing u"" and
b"" strings by hand.
You can also do:
from future_builtins import *
range = xrange
which will replace a bunch of builtins with Python3 compatible versions.
Be prepared for a torrid time getting used to Unicode strings. Not
because Unicode is hard, it isn't, but because you'll have to unlearn a
lot of things that you thought you knew. The first thing to unlearn is
this: there is no such thing as "plain text".
Unfortunately Python 2 tries to be helpful when dealing with text versus
bytes, and that actually teaches bad habits. This is the sort of thing
I'm talking about:
[steve at ando ~]$ python2.7 -c "print 'a' + u'b'"
ab
That sort of implicit conversion of bytes and strings is actually a bad
idea, and Python 3 prohibits it:
[steve at ando ~]$ python3.3 -c "print(b'a' + u'b')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: can't concat bytes to str
The other bad thing about Unicode is that, unless you are lucky enough
to be generating all your own textual data, you'll eventually have to
deal with cross-platform text issues, and text generated by people who
didn't understand Unicode and therefore produce rubbish data containing
mojibake and worse.
But the good thing is, the Unicode model actually isn't hard to
understand, and once you learn the language of "encodings", "code
points" etc. it makes great sense.
Unless you're working with binary data, you are much better off learning
how to use Unicode u"" strings now. Just be aware that unless you are
careful, Python 2 will try to be helpful, and you don't want that.
--
Steven
More information about the Tutor
mailing list