[Python-Dev] unicode Exception messages in py2.7

Chris Barker chris.barker at noaa.gov
Fri Nov 15 19:02:07 CET 2013


On Fri, Nov 15, 2013 at 5:24 AM, "Martin v. Löwis" <martin at v.loewis.de> wrote:

> Procedurally, it's really easy. Ultimately it's up to the release
> manager to decide which changes go into a release and which don't, and
> Benjamin has already voiced an opinion.

Very early in the conversation, though honestly, probably nothing
compelling has been brought up...

> In addition, Guido van Rossum has voiced an opinion a while ago that
> he doesn't consider fixing bugs for 2.7 very useful, and would rather
> see maintenance focus on ongoing support for new operating systems,
> compiler, build environments, etc. The rationale is that people who
> have lived with the glitches of 2.x for so long surely have already
> made their work-arounds, so they aren't helped with receiving bug fixes.

If that's the policy, then that's the policy, but ...

> The same is true in your case: you indicated that you *already* work
> around the problem.

only in one script, and I just looked, and I missed a numer of
locations in that script. I've been running that script for years with
very few changes, but ;last week, someone gave me a utf-8 data file --
it was really easy to read the file as utf-8 (change one line of
code), and bingo! everything else just worked.

Then I hit an exception, and banged my head against the wall for a
while -- though I guess this is what we always deal with anywhere we
introduce unicode to a previously-non-unicode-aware application. I'm
still a bit dumbfounded that you can't use a unicode message in an
Exception, though, still not sure why that's required...

> It may have been tedious when you had to do it,
> but now it's done - and you might not even change your code even if
> Python 2.7.x gets changed, since you might want to support older 2.7.x
> release for some time.

In this case, no -- but really this is more about making it easier to
just dump unicode in somewhere, or, in fact simple give people more
meaningful errors when they do that...

And I have a lot of code that ignores this problem, and I"m sure it
will come up for me and others over and over again. But yeas, it
clearly hasn't been a deal-breaker so far!

On Fri, Nov 15, 2013 at 2:48 AM, Armin Rigo <arigo at tunes.org> wrote:
> FWIW, the pure Python traceback.py module has a slightly different
> (and saner) behavior:
>
>>>> e = Exception(u"xx\u1234yy")
>>>> traceback.print_exception(Exception, e, None)
> Exception: xx\u1234yy
>
> I'd suggest that the behavior of the two should be unified anyway.
> The traceback module uses value.encode("ascii", "backslashreplace")
> for any unicode object.

Nice observation -- so at least someone else agreed with me about what
the "right" thing to do is -- oh well.

On Thu, Nov 14, 2013 at 9:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> I'm not
> convinced that treating Unicode strings as a special case is justified.
> It's been at least four, and possibly six (back to 2.2) point releases
> with this behaviour, and until now apparently nobody has noticed.

Not true -- apparently no one has brought it up on pyton-dev or posted
an issue, but I confirmed that I understood what was going on with a
little googling, including:

http://pythonhosted.org/kitchen/unicode-frustrations.html#frustration-5-exceptions

That's document was written 19 March 2011, and at the time the library
worked with pyton 2.3 and later.

Anyway, back to two questions:

1) could it be improved? it seems there is some disagreement on that one.

and

2) Is this a big enough deal to change 2.* ?

>From what Martin says, No. So we don't need to argue about (1).

I sure hope py3 behavior is solid on this (sorry, no py3 to test on here...)

But I can't help myself:

Of all the guidelines for writing good code, the one I come back to
again and again is DRY -- it drives almost all of my code structure
decisions.

So, in this case, now I need to think about whether to put in a kludge
every single time I raise an Exception. In the script at hand, I
needed to change 7 instances of raising an Exception, out of 10 total.

Contrast that with one line of code changed in the Exception code.

In fact, what I"ll probably do is write a little wrapper that does teh
encoding for an arbitrary exeption, and use that, somethign like:

def my_raise(exp, msg):
    raise exp(unicode(msg).encode('ascii', 'replace'))

But does it really make sense for me to write that an use it all over
the place, as well as everyone else doing their own kludges?

Oh well, I suppose the real lesson is go to Python 3....

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


More information about the Python-Dev mailing list