[Tutor] Python 2 & 3 and unittest
Albert-Jan Roskam
fomcl at yahoo.com
Thu Sep 5 12:20:23 CEST 2013
----- Original Message -----
> From: Steven D'Aprano <steve at pearwood.info>
> To: tutor at python.org
> Cc:
> Sent: Thursday, September 5, 2013 1:11 AM
> Subject: Re: [Tutor] Python 2 & 3 and unittest
>
> On Wed, Sep 04, 2013 at 06:30:12AM -0700, Albert-Jan Roskam wrote:
>> Hi,
>>
>> I am trying to make my app work in Python 2.7 and Python 3.3 (one
>> codebase) and I might later also try to make it work on Python 2.6 and
>> Python 3.2 (if I am not too fed up with it ;-). I was very happy to
>> notice that the 'b' prefix for bytes objects is also supported for
>> byte strings in Python 2.7. Likewise, I just found out that the
> "u"
>> has been re-introduced in Python 3.3 (I believe this is the first
>> Python 3 version where this re-appears).
>>
>> I am now cursing myself for having used doctest for my tests.
>
> Well that's just silly :-)
>
> Doc tests and unit tests have completely different purposes, and
> besides, in general doc tests are easier to make version independent.
> Many of your doc tests will still continue to work, and those that don't
> can almost always be adapted to be cross-platform.
Hmm, maybe the page below was exaggerating. I hope so. Given that quite a few __repr__ methods have changed (bytes objects, views, ...) I still fear that a whole bunch of tests need to be modified.
http://python3porting.com/problems.html:
"Running doctests - One of the more persistently annoying problems you may encounter are doctests. Personally I think doctests are brilliant for testing documentation, but there has been a recommendation in some circuits to make as many tests as possible doctests. This becomes a problem with Python 3 because doctests rely on comparing the output of the code. That means they are sensitive to changes in formatting and Python 3 has several of these. This means that if you have doctests you will get many, many failures. Don’t despair! Most of them are not actual failures, but changes in the output formatting. 2to3 handles that change in the code of the doctests, but not in the output.
If you are only porting to Python 3, the solution is simple and boring. Run the doctests and look at each failure to see if it is a real failure or a change in formatting. This can sometimes be frustrating, as you can sit and stare at a failure trying to figure out what actually is different between the expected and the actual output. On the other hand, that’s normal with doctests, even when you aren’t porting to Python 3, which of course is one of the reasons that they aren’t suitable as the main form of testing for a project.
It gets more tricky if you need to continue to support Python 2, since you need to write output that works in both versions and that can be difficult and in some cases impossible for example when testing for exceptions, see below."
>> So I am planning to rewrite everything in unittest.
>> Is the try-except block below the best way to make this test work in
>> Python 2.6 through 3.3?
>>
>> import unitttest
>> import blah # my package
>>
>>
>> class test_blah(unittest.TestCase):
>> def test_someTest(self):
>> try:
>> expected = [u"lalala", 1] # Python 2.6>= &
> Python 3.3>=
>> except SyntaxError:
>> expected = ["lalala", 1] # Python 3.0, 3.1, 3.2
>
> That cannot work. try...except catches *run time* exceptions.
> SyntaxError occurs at *compile time*, before the try...except gets a
> chance to run.
>
> Unfortunately, there is no good way to write version-independent code
> involving strings across Python 2.x and 3.x. If you just support 3.3 and
> better, it is simple, but otherwise you're stuck with various nasty work
> arounds, none of which are ideal.
>
> Probably the least worst for your purposes is to create a helper
> function in your unit test:
>
> if version < '3':
> def u(astr):
> return unicode(astr)
> else:
> def u(astr):
> return astr
>
> Then, everywhere you want a Unicode string, use:
>
> u("something")
>
> The two problems with this are:
>
> 1) It is slightly slower, but for testing purposes that doesn't really
> matter; and
>
> 2) You cannot write non-ASCII literals in your strings. Or at least not
> safely.
>
>
>> Another, though related question. We have Python 2.7 in the office and
>> eventually we will move to some Python3 version. The code does not
>> generally need to remain Python2 compatible. What is the best
>> strategy: [a] write forward compatible code when using Python 2.7.
>> (using a 'b' prefix for byte strings, parentheses for the print
>> *statement*, sys.exc_info()[1] for error messages, etc.). [b] totally
>> rely on 2to3 script and don't make the Python2 code less reabable and
>> less idiomatic before the upgrade.
>
> Option a, but not the way you say it. Start by putting
>
> from __future__ import division, print_function
Assuming I never use the arguments of the print function, why also import print_function? print("something") works no matter if 'print' is a statement or a function.
> at the top of your 2.x code. I don't believe there is a way to make
> string literals unicode, you just have to get used to writing u"" and
> b"" strings by hand.
>
> You can also do:
>
> from future_builtins import *
> range = xrange
>
> which will replace a bunch of builtins with Python3 compatible versions.
>
> Be prepared for a torrid time getting used to Unicode strings. Not
> because Unicode is hard, it isn't, but because you'll have to unlearn a
> lot of things that you thought you knew. The first thing to unlearn is
> this: there is no such thing as "plain text".
Switching back and forth between python 2 and 3 creates quite some mental overhead indeed. The association of str() with the concept "byte strings" is hard-coded somewhere in my medulla oblongata. And now it suddenly means unicode string! Nice implementation of the stroop effect ;-)) http://en.wikipedia.org/wiki/Stroop_effect
> Unfortunately Python 2 tries to be helpful when dealing with text versus
> bytes, and that actually teaches bad habits. This is the sort of thing
> I'm talking about:
>
> [steve at ando ~]$ python2.7 -c "print 'a' + u'b'"
> ab
The ascii_with_complaints codec seems to offer some help here. I have not yet tried it though.
> That sort of implicit conversion of bytes and strings is actually a bad
> idea, and Python 3 prohibits it:
>
> [steve at ando ~]$ python3.3 -c "print(b'a' + u'b')"
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> TypeError: can't concat bytes to str
>
>
> The other bad thing about Unicode is that, unless you are lucky enough
> to be generating all your own textual data, you'll eventually have to
> deal with cross-platform text issues, and text generated by people who
> didn't understand Unicode and therefore produce rubbish data containing
> mojibake and worse.
>
> But the good thing is, the Unicode model actually isn't hard to
> understand, and once you learn the language of "encodings", "code
>
> points" etc. it makes great sense.
>
> Unless you're working with binary data, you are much better off learning
> how to use Unicode u"" strings now. Just be aware that unless you are
> careful, Python 2 will try to be helpful, and you don't want that.
I am using lots of char pointers (ctypes.c_char_p) which don't take unicode strings. I wrote a small wrapper function c_char_py3k that turns the strings in to bytes objects if applicable and needed, then calls c_char_p.
More information about the Tutor
mailing list