On Aug 25, 2012, at 09:16 AM, Nick Coghlan wrote:
A couple of people at PyCon Au mentioned running into this kind of issue with Python 3. It relates to the fact that:
- String formatting is *coercive* by default
- Absolutely everything, including bytes objects can be coerced to a
string, due to the repr() fallback
So it's relatively easy to miss a decode or encode operation, and end up interpolating an unwanted "b" prefix and some quotes.
For existing versions, I think the easiest answer is to craft a regex that matches bytes object repr's and advise people to check that it *doesn’t* match their formatted strings in their unit tests.
For 3.4+ a non-coercive string interpolation format code may be desirable.
Or maybe just one that calls __str__ without a __repr__ fallback?
FWIW, the representation of bytes with the leading b'' does cause problems when trying to write doctests that work in both Python 2 and 3.
It might be a bit nicer to be able to write:
Of course, in the bytes case, its __str__() would have to be rewritten to not call its __repr__() explicitly. It's probably not worth it just to save from writing a small helper function, but it would be useful in eliminating a surprising gotcha.
The other option is of course just to make doctests smarter[*].
[*] Doctest haters need not respond snarkily. :)