Mailman 3 Python in Unicode context - Python-Dev

newer
Re: Call for defense of @decorators

Python in Unicode context

François Pinard

Aug. 3, 2004

2:53 p.m.

Hello, people. I'm switching from ISO-8859-1 to UTF-8 in my locale, knowing it may take a while before everything gets fully adapted. Of course, I am prepared to do whatever it means. On my side at least, the perception of a meaning is an evolving process. :-) So, my goal here is to share some of difficulties I see with the current setup of Python in Unicode context, under the hypothesis that Python should ideally be designed to alleviate the pain of migration. I hope this is not out of context on the Python development list. Converting a Python source file from ISO-8859-1 to UTF-8, back and forth at the charset level, is a snap within Vim, and I would like if it was (almost) a snap in the Python code as well. There is some amount of trickery that I could put in to achieve this, but too much trickery does not fit well in usual Python elegance. As Martin once put it, the ultimate goal is to convert data to Unicode as early as possible in a Python program, and back to the locale as late as possible. While it's very OK with me, we should not loose sight that people might adopt different approaches. One thing is that a Python module should have some way to know the encoding used in its source file, maybe some kind of `module.__coding__' next to `module.__file__', saving the coding effectively used while compilation was going on. When a Python module is compiled, per PEP 0263 as I understand it, strings are logically converted to UTF-8 before scanning, and produced str-strings (but not unicode-strings), converted back to the original file coding. When later, at runtime, the string has to be converted back to Unicode, it would help if the programmer did not have to hardwire the encoding in the program, and edit more than the `coding:' cookie at the beginning if s/he ever switches file charset. That same `module.__coding__' could also be used for other things, like for example, to decide at run-time whether codecs streawriters should be used or not. Another solution would of course be to edit all strings, or at least those containing non-ASCII characters, to prepend a `u' and turn them into Unicode strings. This is what I intend to do in practice. However, all this editing is cumbersome, especially until it is definitive. I wonder if some other cookie, next to the `coding:' cookie, could not be used to declare that all strings _in this module only_ should be interpreted as Unicode by default, but without the need of resorting to `u' prefix all over. That would be weaker than the `-U' switch on a Python call, but likely much more convenient as well. As a corollary, maybe that some `s' prefix could force `str' type in a Unicodized module. Another way of saying it would be that an unadorned string would have `s' or `u' implied, depending if the Unicode cookie is missing or given at the start of a module. I have the intuition, still unverified, but to be confirmed over time and maybe discussions, that the above would alleviate transition to Unicode, back and forth. P.S. - Should I say and confess, one thing I do not like much about Unicode is how proponents often perceive it, like a religion, and all the fanatism going with it. Unicode should be seen and implemented as a choice, more than a life commitment :-). Right now, my feeling is that Python asks a bit too much of a programmer, in terms of commitment, if we only consider the editing work required on sources to use it, or not. -- François Pinard http://www.iro.umontreal.ca/~pinard

Show replies by date

"Martin v. Löwis"

August 2004

5:24 p.m.

François Pinard wrote:

...

One thing is that a Python module should have some way to know the encoding used in its source file, maybe some kind of `module.__coding__' next to `module.__file__', saving the coding effectively used while compilation was going on.

That would be possible to implement. Feel free to create a patch.

...

I wonder if some other cookie, next to the `coding:' cookie, could not be used to declare that all strings _in this module only_ should be interpreted as Unicode by default, but without the need of resorting to `u' prefix all over.

This could be a starting point of another syntax debate. For example, from __future__ import string_literals_are_unicode would be possible to implement. If PEP 244 would have been adapted, I would have proposed directive unicode_strings Other syntax forms would also be possible. Again, if you know a syntax which you like, propose a patch. Be prepared to also write a PEP defending that syntax.

...

P.S. - Should I say and confess, one thing I do not like much about Unicode is how proponents often perceive it, like a religion, and all the fanatism going with it. Unicode should be seen and implemented as a choice, more than a life commitment :-). Right now, my feeling is that Python asks a bit too much of a programmer, in terms of commitment, if we only consider the editing work required on sources to use it, or not.

Not sure what you are referring here to. You do have the choice of source encodings, and, in fact, "Unicode" is not a valid source encoding. "UTF-8" is, and from a Python point of view, there is absolutely no difference between that and, say, "ISO-8859-15" - you can choose whatever source encoding you like, and Python does not favour any of them (strictly speaking, it favour ASCII, then ISO-8859-1, then the rest). Choice of source encoding is different from the choice of string literals. You can use Unicode strings, or byte strings, or mix them. It really is your choice. Regards, Martin

M.-A. Lemburg

5:35 p.m.

Martin v. Löwis wrote:

...

François Pinard wrote:

...
One thing is that a Python module should have some way to know the encoding used in its source file, maybe some kind of `module.__coding__' next to `module.__file__', saving the coding effectively used while compilation was going on.

That would be possible to implement. Feel free to create a patch.

...

...
I wonder if some other cookie, next to the `coding:' cookie, could not be used to declare that all strings _in this module only_ should be interpreted as Unicode by default, but without the need of resorting to `u' prefix all over.

This could be a starting point of another syntax debate. For example,

from __future__ import string_literals_are_unicode

would be possible to implement. If PEP 244 would have been adapted, I would have proposed

directive unicode_strings

Other syntax forms would also be possible. Again, if you know a syntax which you like, propose a patch. Be prepared to also write a PEP defending that syntax.

+1 Things that have been proposed earlier on, extended a bit: b'xxx' - return a buffer to hold binary data; same as buffer(s'abc') s'abc' - (forced) 8-bit string literal in source code encoding u'abc' - (forced) Unicode literal 'abc' - maps to s'abc' per default, can map to u'abc' based on the command line switch -U or a module switch -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 03 2004)

...

...
...
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

François Pinard

7:47 p.m.

[M.-A. Lemburg]

...

Martin v. Löwis wrote:

...

Things that have been proposed earlier on, extended a bit:

...

b'xxx' - return a buffer to hold binary data; same as buffer(s'abc') s'abc' - (forced) 8-bit string literal in source code encoding u'abc' - (forced) Unicode literal

I currently do not see the need of a fine distinction between `b' or `s' as a prefix. `s' and `u' are the first letter of the type (`str' or `unicode') and that makes them natural.

...

'abc' - maps to s'abc' per default, can map to u'abc' based on the command line switch -U or a module switch

The idea would be, indeed, to create some kind of per-module switch. I'm less sure that `-U' is any useful in practice, as long as all of the library does not become "Unicode-aware", whatever that would imply... P.S. - Command line switch for command line switch :-), a switch for fully turning on the newer type system would be more productive than `-U', and put some pressure for refreshening the library in this area. Just curious, as I do not intend to volunteer in this area, is there something else than Exception in the Python internals that rely on old-style classes? -- François Pinard http://www.iro.umontreal.ca/~pinard

Guido van Rossum

8:14 p.m.

...

P.S. - Command line switch for command line switch :-), a switch for fully turning on the newer type system would be more productive than `-U', and put some pressure for refreshening the library in this area. Just curious, as I do not intend to volunteer in this area, is there something else than Exception in the Python internals that rely on old-style classes?

Probably not, but making Exception a new-style class won't be easy. (It will also break code that creates a class used as an exception that doesn't derive from Exception, but those should be shot. :-) Until Exception is a new-style class, such a switch wouldn't really work. --Guido van Rossum (home page: http://www.python.org/~guido/)

Michael Hudson

10:38 a.m.

Guido van Rossum <guido@python.org> writes:

...

...
P.S. - Command line switch for command line switch :-), a switch for fully turning on the newer type system would be more productive than `-U', and put some pressure for refreshening the library in this area. Just curious, as I do not intend to volunteer in this area, is there something else than Exception in the Python internals that rely on old-style classes?

Probably not, but making Exception a new-style class won't be easy.

What makes you say that? I've just been remarking on comp.lang.python how having Exception be new-style in PyPy -- indeed, not having old-style classes and all -- has caused essentially no problems at all. Perhaps I'll work up a patch sometime and see what breaks. Cheers, mwh --

...

Emacs is a fashion statement. No, Gnus is a fashion statement. Emacs is clothing. Everyone else is running around naked. -- Karl Kleinpaste & Jonadab the Unsightly One, gnu.emacs.gnus

Guido van Rossum

2:34 p.m.

...

...
Probably not, but making Exception a new-style class won't be easy.

What makes you say that? I've just been remarking on comp.lang.python how having Exception be new-style in PyPy -- indeed, not having old-style classes and all -- has caused essentially no problems at all.

I believe that -- it's more that the existing infrastructure that creates the standard exception hierarchy isn't easily adapted. I also believe there's a conceptual problem with defining when something is an acceptable argument to 'raise' -- unless we insist that exceptions inherit from a designated base class, *every* object is acceptable, because if it isn't a class, it's an instance of a class, and raise allows either. I don't really think that "raise 42" ought to be acceptable, but I don't know how to prevent it without requiring a specific base class (excluding a whole slew of specific base classes seems wrong). Maybe we can accept old-style classes and instances, strings, and instances of Exception and its subclasses. But then we better be sure that we really want to insist on subclassing from Exception, because that's rather unpythonic.

...

Perhaps I'll work up a patch sometime and see what breaks.

Please do! --Guido van Rossum (home page: http://www.python.org/~guido/)

Phillip J. Eby

4:22 p.m.

New subject: Exception and new-style classes

At 07:34 AM 8/4/04 -0700, Guido van Rossum wrote:

...

...
...
Probably not, but making Exception a new-style class won't be easy.

What makes you say that? I've just been remarking on comp.lang.python how having Exception be new-style in PyPy -- indeed, not having old-style classes and all -- has caused essentially no problems at all.

I believe that -- it's more that the existing infrastructure that creates the standard exception hierarchy isn't easily adapted.

I also believe there's a conceptual problem with defining when something is an acceptable argument to 'raise' -- unless we insist that exceptions inherit from a designated base class, *every* object is acceptable, because if it isn't a class, it's an instance of a class, and raise allows either. I don't really think that "raise 42" ought to be acceptable, but I don't know how to prevent it without requiring a specific base class (excluding a whole slew of specific base classes seems wrong).

Maybe we can accept old-style classes and instances, strings, and instances of Exception and its subclasses. But then we better be sure that we really want to insist on subclassing from Exception, because that's rather unpythonic.

I thought that was already the plan; I seem to recall dire warnings that it was going to be required someday. In any case, I had my eye on "fixing" this for next bug day (there's a SF bug # for it, that I don't recall at the moment). My plan was to allow any object that was an instance of Exception, even if it was new-style. In other words, new-style exceptions would have to include Exception in their base classes. Old-style exceptions wouldn't have to meet that requirement, for backward compatibility purposes. I assumed that the old-style (and string) compatibility would need to remain until 3.0.

Greg Ewing

1:33 a.m.

...

Maybe we can accept old-style classes and instances, strings, and instances of Exception and its subclasses.

Seems to me the point at which we start allowing new-style classes as exceptions should also be the point at which we drop the idea of string exceptions. Would that help?

...

I don't really think that "raise 42" ought to be acceptable, but I don't know how to prevent it

Maybe we need to think more deeply about *why* it shouldn't be acceptable. If we can figure out exactly what the criterion should be, maybe we can think of a reasonable way of testing for it. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Paul Prescod

2:32 a.m.

Greg Ewing wrote:

...

...

...
I don't really think that "raise 42" ought to be acceptable, but I don't know how to prevent it

Maybe we need to think more deeply about *why* it shouldn't be acceptable. If we can figure out exactly what the criterion should be, maybe we can think of a reasonable way of testing for it.

Exceptions naturally form a hierarchy. At the same time, inheritance of behaviour among exceptions is seldom necessary. Therefore, exceptions inherit from each other in order to build a classification system, not to share code. This is the opposite of the traditional reasons for classes inheriting from other classes in Python. This is why it seems "unpythonic" to require exceptions to be single-rooted. But having a proper classification system is exactly what is required to allow robust, modular code that catches the right exceptions under the right circumstances and responds in the right way. So it is pythonic after all. In a few senses the _current model_ is unpythonic. There is no catch-all root so you have to use a "bare" except to catch every exception type. This makes it hard to introspect on the caught object. But introspection is the MOST IMPORTANT THING when you are catching all exceptions (because you should be logging the exception or something). Paul Prescod

M.-A. Lemburg

7:45 a.m.

Paul Prescod wrote:

...

Greg Ewing wrote:

...
...

...
I don't really think that "raise 42" ought to be acceptable, but I don't know how to prevent it

Maybe we need to think more deeply about *why* it shouldn't be acceptable. If we can figure out exactly what the criterion should be, maybe we can think of a reasonable way of testing for it.

Exceptions naturally form a hierarchy. At the same time, inheritance of behaviour among exceptions is seldom necessary. Therefore, exceptions inherit from each other in order to build a classification system, not to share code.

I wouldn't say that: exceptions can have error handlers, callbacks, inherited attributes, etc. etc. and you can put these to good use in your application.

...

This is the opposite of the traditional reasons for classes inheriting from other classes in Python. This is why it seems "unpythonic" to require exceptions to be single-rooted.

I don't know what should be "unpythonic" about having a single root for exceptions. Would someone care to explain ? To me ... try: ... except Exception, errobj: # catches all exceptions pass ... is the most natural way of using that single root (and it already works great today).

...

But having a proper classification system is exactly what is required to allow robust, modular code that catches the right exceptions under the right circumstances and responds in the right way. So it is pythonic after all.

In a few senses the _current model_ is unpythonic. There is no catch-all root so you have to use a "bare" except to catch every exception type. This makes it hard to introspect on the caught object. But introspection is the MOST IMPORTANT THING when you are catching all exceptions (because you should be logging the exception or something).

-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 05 2004)

...

...
...
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

Michael Hudson

11:18 a.m.

"M.-A. Lemburg" <mal@egenix.com> writes:

...

To me ...

try: ... except Exception, errobj: # catches all exceptions pass

... is the most natural way of using that single root (and it already works great today).

Well, uh, it's not totally bullet proof:

...

...
...
class C: pass ... [24618 refs] try: raise C ... except Exception, err: print err ... Traceback (most recent call last): File "<stdin>", line 1, in ? __main__.C: <__main__.C instance at 0x403a7814> [24654 refs]

but I this really doesn't seem to happen in the wild. (I have a hacky patch which makes exceptions new-style which I'll post in a moment). Cheers, mwh -- Lisp nearing the age of 50 is the most modern language out there. GC, dynamic, reflective, the best OO model extant including GFs, procedural macros, and the only thing old-fashioned about it is that it is compiled and fast. -- Kenny Tilton, comp.lang.python

M.-A. Lemburg

11:48 a.m.

Michael Hudson wrote:

...

"M.-A. Lemburg" <mal@egenix.com> writes:

...
To me ...

try: ... except Exception, errobj: # catches all exceptions pass

... is the most natural way of using that single root (and it already works great today).

Well, uh, it's not totally bullet proof:

I meant that it works for the vast majority of all cases you see in practice. I haven't seen a non-Exception based exception in years.

...

...
...
...
class C: pass

... [24618 refs]

...
...
...
try: raise C

... except Exception, err: print err ... Traceback (most recent call last): File "<stdin>", line 1, in ? __main__.C: <__main__.C instance at 0x403a7814> [24654 refs]

but I this really doesn't seem to happen in the wild.

(I have a hacky patch which makes exceptions new-style which I'll post in a moment).

Cheers, mwh

-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 05 2004)

...

...
...
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

Michael Hudson

12:05 p.m.

"M.-A. Lemburg" <mal@egenix.com> writes: D> Michael Hudson wrote:

...

...
"M.-A. Lemburg" <mal@egenix.com> writes:

...
To me ...

try: ... except Exception, errobj: # catches all exceptions pass

... is the most natural way of using that single root (and it already works great today). Well, uh, it's not totally bullet proof:

I meant that it works for the vast majority of all cases you see in practice.

OK, then we're on the same page.

...

I haven't seen a non-Exception based exception in years.

I hadn't until I looked into test_opcodes last night! Cheers, mwh -- In that case I suggest that to get the correct image you look at the screen from inside the monitor whilst standing on your head. -- James Bonfield, http://www.ioccc.org/2000/rince.hint

Aahz

3:22 p.m.

New subject: Except that! (was Re: Python in Unicode context)

On Thu, Aug 05, 2004, M.-A. Lemburg wrote:

...

I meant that it works for the vast majority of all cases you see in practice. I haven't seen a non-Exception based exception in years.

My current company still has lots of string exceptions. :-( I'm working on changing that. (Yes, we started with Python 1.4.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com

Michael Hudson

1:27 p.m.

New subject: New-style exceptions

Michael Hudson <mwh@python.net> writes:

...

(I have a hacky patch which makes exceptions new-style which I'll post in a moment).

Well, it turns out to be a bit big for attaching, so it's here: http://starship.python.net/crew/mwh/hacks/new-style-exceptions-hacking.diff This is very much a first cut; no attempt at subtlety. The procedure went roughly "Hack until it compiles, hack until it doesn't dump core immediately, hack until most tests pass." The good news: all tests but test_pickletools pass (and that's doomed; just look at it). The bad news: I've entirely broken raising old-style classes (well, not quite:

...

...
...
try: raise C ... except types.ClassType, c: print c ... __main__.C

) so I've had to make sure various classes used in the test suite inherit from exception. There was a bunch of shallow breakage -- f'ex str(old-style-class) is quite different from str(new-style-class), which broke various output comparison tests (try not to gag when you see how I worked around this) -- but not much that's deep. You can get a *different* kind of shallow breakage by essentially removing old-style classes (changing the default metatype to type), but then 'types.ClassType is type' and this from copy_reg.py: def pickle(ob_type, pickle_function, constructor_ob=None): if type(ob_type) is _ClassType: raise TypeError("copy_reg is not intended for use with classes") rather fails to do the right thing. I didn't pursue this one very far. Obviously, a better attempt would be to allow raising any old-style class or instance or any subtype of Exception or any instance of a subtype of Exception -- but that becomes tedious to spell. I suspect that it would be quite hard -- or at least prohibitively tedious -- to write code that worked with both old- and new-style exceptions, so I'm not sure a -N switch to turn them on would work. At least not without a small battery of helper functions that noone would bother to use. I guess this means making exceptions new-style might have to wait for a Python 3.0-ish flag day of some kind. Cheers, mwh -- Roll on a game of competetive offence-taking. -- Dan Sheppard, ucam.chat

Tim Peters

3:28 p.m.

New subject: New-style exceptions

[Michael Hudson] ...

...

Well, it turns out to be a bit big for attaching, so it's here:

http://starship.python.net/crew/mwh/hacks/new-style-exceptions-hacking.diff

This is very much a first cut; no attempt at subtlety. The procedure went roughly "Hack until it compiles, hack until it doesn't dump core immediately, hack until most tests pass."

The good news: all tests but test_pickletools pass (and that's doomed; just look at it).

Eh? test_pickletools is a three-line test, which just runs the doctests in pickletools.py. The only exceptions mentioned in the latter are the builtin ValueError and OverflowError. What's the problem?

Michael Hudson

3:32 p.m.

New subject: New-style exceptions

Tim Peters <tim.peters@gmail.com> writes:

...

[Michael Hudson] ...

...
Well, it turns out to be a bit big for attaching, so it's here:

http://starship.python.net/crew/mwh/hacks/new-style-exceptions-hacking.diff

This is very much a first cut; no attempt at subtlety. The procedure went roughly "Hack until it compiles, hack until it doesn't dump core immediately, hack until most tests pass."

The good news: all tests but test_pickletools pass (and that's doomed; just look at it).

Eh? test_pickletools is a three-line test, which just runs the doctests in pickletools.py. The only exceptions mentioned in the latter are the builtin ValueError and OverflowError. What's the problem?

Sorry, was too obscure. One of the doctests pickles a couple instances of PicklingError and disassembles the pickle. That's not going to stay the same past an old-style/new-style transition. Cheers, mwh -- If a train station is a place where a train stops, what's a workstation? -- unknown (to me, at least)

Tim Peters

6:23 p.m.

New subject: New-style exceptions

[Michael Hudson]

...

Sorry, was too obscure. One of the doctests pickles a couple instances of PicklingError and disassembles the pickle. That's not going to stay the same past an old-style/new-style transition.

That's no problem. The real point of that part of the test is to exercise the different pickle protocols on an instance of a "foreign" (not defined in the same module) class. Any foreign class would do as well Creating an instance of pickle.PicklingError was just convenient; there was no intent to pick on an exception class. IOW, if this goes forward, it's no problem to change the test.

Michael Hudson

6:30 p.m.

New subject: New-style exceptions

Tim Peters <tim.peters@gmail.com> writes:

...

[Michael Hudson]

...
Sorry, was too obscure. One of the doctests pickles a couple instances of PicklingError and disassembles the pickle. That's not going to stay the same past an old-style/new-style transition.

That's no problem. The real point of that part of the test is to exercise the different pickle protocols on an instance of a "foreign" (not defined in the same module) class. Any foreign class would do as well Creating an instance of pickle.PicklingError was just convenient; there was no intent to pick on an exception class. IOW, if this goes forward, it's no problem to change the test.

Uhh, sure. It was just to tell people what to expect if they were so foolish as to download the patch and try it :-) I probably didn't choose my words very well. Cheers, mwh -- This is the fixed point problem again; since all some implementors do is implement the compiler and libraries for compiler writing, the language becomes good at writing compilers and not much else! -- Brian Rogoff, comp.lang.functional

Michael Hudson

11:16 a.m.

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

...

...
Maybe we can accept old-style classes and instances, strings, and instances of Exception and its subclasses.

Seems to me the point at which we start allowing new-style classes as exceptions should also be the point at which we drop the idea of string exceptions. Would that help?

It would probably make things a little simpler, but probably not in a major way. Cheers, mwh -- Or if you happen to be resigned to the size of your trouser snake and would rather not be reminded of it, training a shared classifier to reject penis-enlargement spam stops Barry from getting the help he so desperately needs. -- Tim Peters, c.l.python

Greg Ewing

12:25 a.m.

...

...
Seems to me the point at which we start allowing new-style classes as exceptions should also be the point at which we drop the idea of string exceptions. Would that help?

It would probably make things a little simpler, but probably not in a major way.

I was thinking it might avoid the need to enforce a common base class for exceptions, since it would remove the ambiguity of whether 'raise "spam"' is raising a string exception or an instance of class str. But if it's considered a good idea to enforce a common root anyway, I guess it doesn't make much difference. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Michael Hudson

11:15 a.m.

Guido van Rossum <guido@python.org> writes:

...

(It will also break code that creates a class used as an exception that doesn't derive from Exception, but those should be shot. :-)

Would you like to guess how often that happens in the Python test suite? :-) Cheers, mwh -- ARTHUR: Yes. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying "Beware of the Leopard". -- The Hitch-Hikers Guide to the Galaxy, Episode 1

Holger Krekel

11:29 a.m.

Guido van Rossum wrote:

...

(It will also break code that creates a class used as an exception that doesn't derive from Exception, but those should be shot. :-)

Then i guess that searching down into a recursive structure and just raising an "i found it" result object up doesn't count as a use case in your book, right? It can avoid boilerplate code like return-if-not-None checks and I have used it for e.g. finding patterns in an AST-Tree. cheers, Holger

Aahz

3:26 p.m.

New subject: Exceptional inheritance patterns (was Re: Python in Unicode context)

On Thu, Aug 05, 2004, Holger Krekel wrote:

...

Guido van Rossum wrote:

...
(It will also break code that creates a class used as an exception that doesn't derive from Exception, but those should be shot. :-)

Then i guess that searching down into a recursive structure and just raising an "i found it" result object up doesn't count as a use case in your book, right? It can avoid boilerplate code like return-if-not-None checks and I have used it for e.g. finding patterns in an AST-Tree.

In cases where I've done this, I've always inherited from Exception or a subclass. Is there any reason not to? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com

Holger Krekel

3:34 p.m.

New subject: Exceptional inheritance patterns (was Re: Python in Unicode context)

Aahz wrote:

...

On Thu, Aug 05, 2004, Holger Krekel wrote:

...
Guido van Rossum wrote:

...
(It will also break code that creates a class used as an exception that doesn't derive from Exception, but those should be shot. :-)

Then i guess that searching down into a recursive structure and just raising an "i found it" result object up doesn't count as a use case in your book, right? It can avoid boilerplate code like return-if-not-None checks and I have used it for e.g. finding patterns in an AST-Tree.

In cases where I've done this, I've always inherited from Exception or a subclass. Is there any reason not to?

Sure, i can probably wrap the result object into some class which inherits from Exception. My point is that I like to regard try/except as a mechanism for "out-of-band" objects. Guidos "should be shot" seems to indicate he sees try/except only useful/applicable to exception-handling. Holger P.S.: thanks for changing the subject line, should have done that earlier.

Aahz

4:04 p.m.

New subject: Exceptional inheritance patterns (was Re: Python in Unicode context)

On Thu, Aug 05, 2004, Holger Krekel wrote:

...

Aahz wrote:

...
On Thu, Aug 05, 2004, Holger Krekel wrote:

...
Guido van Rossum wrote:

...
(It will also break code that creates a class used as an exception that doesn't derive from Exception, but those should be shot. :-)

Then i guess that searching down into a recursive structure and just raising an "i found it" result object up doesn't count as a use case in your book, right? It can avoid boilerplate code like return-if-not-None checks and I have used it for e.g. finding patterns in an AST-Tree.

In cases where I've done this, I've always inherited from Exception or a subclass. Is there any reason not to?

Sure, i can probably wrap the result object into some class which inherits from Exception. My point is that I like to regard try/except as a mechanism for "out-of-band" objects. Guidos "should be shot" seems to indicate he sees try/except only useful/applicable to exception-handling.

Nope, he meant exactly what he said: an exception that doesn't derive from Exception. After all, the iterator protocol relies on similar out-of-band exceptions, and the ``for`` loop has done the same with IndexError for years. Further discussion about Pythonic exception handling should probably get moved to comp.lang.python -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com

Greg Ewing

12:39 a.m.

New subject: Exceptional inheritance patterns (was Re: Python in Unicode context)

...

My point is that I like to regard try/except as a mechanism for "out-of-band" objects. Guidos "should be shot" seems to indicate he sees try/except only useful/applicable to exception-handling.

If the root exception class were called something else, such as 'Raisable', would that make you feel better? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Holger Krekel

7:45 a.m.

New subject: Exceptional inheritance patterns (was Re: Python in Unicode context)

Greg Ewing wrote:

...

...
My point is that I like to regard try/except as a mechanism for "out-of-band" objects. Guidos "should be shot" seems to indicate he sees try/except only useful/applicable to exception-handling.

If the root exception class were called something else, such as 'Raisable', would that make you feel better?

Yes, I certainly wouldn't object. I guess this would mean 'Exception' would derive from Raisable because Exception itself should probably not go away. Hey, strings could inherit from Raisable, too! Just kidding :-) Then again, i think Python has a tradition of not requiring inheritance but just behaviour. And doesn't this whole issue only exist because "raise X" with X being a class and autoinstantiated is allowed? Well, anyway, let's not add too much to the current python-dev traffic with this issue. I think it has been brought up a couple of times already. Hey, i have an idea: why not create a python-syntax mailing list (or python-hell :-) ? cheers, Holger

Michael Hudson

10:16 a.m.

New subject: Exceptional inheritance patterns

Holger Krekel <pyth@devel.trillke.net> writes:

...

Greg Ewing wrote:

...
...
My point is that I like to regard try/except as a mechanism for "out-of-band" objects. Guidos "should be shot" seems to indicate he sees try/except only useful/applicable to exception-handling.

If the root exception class were called something else, such as 'Raisable', would that make you feel better?

Yes, I certainly wouldn't object. I guess this would mean 'Exception' would derive from Raisable because Exception itself should probably not go away. Hey, strings could inherit from Raisable, too! Just kidding :-)

I would like an exception class that almost exceptions except KeyboardInterrupt, SystemExit and -- maybe -- RuntimeError and MemoryError inherited from. except ExceptionsButThoseNastyOnesIDontWantToCatch: pass ? <wink>

...

Then again, i think Python has a tradition of not requiring inheritance but just behaviour. And doesn't this whole issue only exist because "raise X" with X being a class and autoinstantiated is allowed?

I would say that it's more because it's useful to organize exceptional conditions in a tree like hierarchy and inheritance is a usable way to do this. The fact that arranging things into tree like hierarchies *isn't* what inheritance is usually used for in Python (unlike many other languages) is what creates the dissonance, IMHO.

...

Well, anyway, let's not add too much to the current python-dev traffic with this issue. I think it has been brought up a couple of times already.

Indeed, I think someone said what I just said above in this thread already :-)

...

Hey, i have an idea: why not create a python-syntax mailing list (or python-hell :-) ?

I think I've suggested that before :-) Today is the first day I remember where there are more new messages waiting for me in python-dev than comp.lang.python! Cheers, mwh -- "The future" has arrived but they forgot to update the docs. -- R. David Murray, 9 May 2000

Skip Montanaro

12:55 p.m.

New subject: Exceptional inheritance patterns

Michael> I would like an exception class that almost exceptions except Michael> KeyboardInterrupt, SystemExit and -- maybe -- RuntimeError and Michael> MemoryError inherited from. Michael> except ExceptionsButThoseNastyOnesIDontWantToCatch: Michael> pass Michael> ? <wink> I proposed a change to the exceptions hierarchy a few years ago that would allow this. It was obviously never implemented, but I no longer remember why. Skip

Guido van Rossum

2:52 p.m.

New subject: Exceptional inheritance patterns

...

Michael> I would like an exception class that almost exceptions except Michael> KeyboardInterrupt, SystemExit and -- maybe -- RuntimeError and Michael> MemoryError inherited from.

Michael> except ExceptionsButThoseNastyOnesIDontWantToCatch: Michael> pass

Michael> ? <wink>

I proposed a change to the exceptions hierarchy a few years ago that would allow this. It was obviously never implemented, but I no longer remember why.

Skip

Probably inertia and compatibility issues (2.2 must've been brand new when you proposed that). I've become swayed by Paul Prescod's recent argument that exceptions use subclassing for classification reasons and that there's nothing wrong with *requiring* them to subclass some common base class. --Guido van Rossum (home page: http://www.python.org/~guido/)

Greg Ewing

12:32 a.m.

...

Then i guess that searching down into a recursive structure and just raising an "i found it" result object up doesn't count as a use case in your book, right?

You can always wrap the object you want to return in a suitable subclass of Exception. I would prefer to write it that way anyway, otherwise I'd need some sort of "catch anything which *isn't* a subclass of Exception" statement, which is rather awkward to spell. If this is considered a common enough requirement, a special built-in Exception subclass could even be provided for it, to save people the bother of defining their own... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

François Pinard

7:35 p.m.

[Martin von Löwis]

...

François Pinard wrote:

...

...
maybe some kind of `module.__coding__' next to `module.__file__', saving the coding effectively used while compilation was going on.

...

That would be possible to implement. Feel free to create a patch.

I might try, and it would be my first Python patch. But please, please tell me if the idea is not welcome, as my free time is rather short and I already have a lot of things waiting for me! :-).

...

...
I wonder if some other cookie, next to the `coding:' cookie, could not be used to declare that all strings _in this module only_ should be interpreted as Unicode by default, but without the need of resorting to `u' prefix all over.

...

[...] if you know a syntax which you like, propose a patch. Be prepared to also write a PEP defending that syntax.

Surely no particular syntax that I like enough for defending it. Anything reasonable would do as far as I am concerned, so I might propose a reasonable patch without involving myself into a crusade. Yet I may try to assemble and edit together the ideas of others, if it serves a purpose.

...

...
Right now, my feeling is that Python asks a bit too much of a programmer, in terms of commitment, if we only consider the editing work required on sources to use it, or not.

...

Not sure what you are referring here to.

There is currently a lot of effort involved in Python so Unicode strings and usual strings inter-operate correctly and automatically, also hiding as much as reasonable to the unwilling user whether if characters are large or narrow: s/he uses about the same code no matter what. The way Python does is rather lovely, in fact. :-) I'm going to transform a flurry of Latin-1 Python scripts to UTF-8, but not all of them, as I'm not going to impose Unicode in our team where it is not wanted. For French, and German and many others, we have been lucky enough for having one codepoint per character in Unicode, so we can hope that programs assuming that S[N] addresses the N'th (0-based) character of string S will work the same way irrelevant of if strings are narrow or wide. However, and I shall have the honesty to state it, this is *not* respectful of the general Unicode spirit: the Python implementation allows for independently addressable surrogate halves, combining zero-width diacritics, normal _and_ decomposed forms, directional marks, linguistic marks and various other such complexities. But in our case, where applications already work in Latin-1, abusing our Unicode luck, UTF-8 may _not_ be used as is, we ought to use Unicode or wide strings as well, for preserving S[N] addressability. So changing source encodings may be intimately tied to going Unicode whenever UTF-8 (or any other variable-length encoding) gets into the picture.

...

You do have the choice of source encodings, and, in fact, "Unicode" is not a valid source encoding. "UTF-8" is [...]

Guess that I know! :-) :-)

...

[...] from a Python point of view, there is absolutely no difference between [UTF-8] and, say, "ISO-8859-15". Choice of source encoding is different from the choice of string literals. You can use Unicode strings, or byte strings, or mix them. It really is your choice.

I hope that my explanation above helps at seeing that source encoding and choice of string literals are not as independent as one may think. A choice that I surely do _not_ have is to see bugs appear in programs merely because I changed the source encoding. Going from ISO 8859-1 to ISO 8859-15 for a Python source is probably fairly safe, because there is no need for switching the narrowness of strings. Going from ISO 8859-1 to UTF-8 is very unsafe, and editing all literal strings from narrow to wide, using `u' prefixes, becomes almost unavoidable. There ought to be a way to maintain a single Python source that would work dependably through re-encoding of the source, but not uselessly relying on wide strings when there is no need for them. That is, without marking all literal strings as being Unicode. Changing encoding from ISO 8859-1 to UTF-8 should not be a one-way, no-return ticket. Of course, it is very normal that sources may have to be adapted for the possibility of a Unicode context. There should be some good style and habits for writing re-encodable programs. So this exchange of thoughts. -- François Pinard http://www.iro.umontreal.ca/~pinard

"Martin v. Löwis"

7:21 a.m.

François Pinard wrote:

...

However, and I shall have the honesty to state it, this is *not* respectful of the general Unicode spirit: the Python implementation allows for independently addressable surrogate halves

This is only a problem if you have data which require surrogates (which I claim are rather uncommon at the moment), and you don't have a UCS-4 build of Python (in which surrogates don't exist). As more users demand convenient support for non-BMP characters, you'll find that more builds of Python become UCS-4. In fact, you might find that the build you are using already has sys.maxunicode > 65535.

...

combining zero-width diacritics

Indeed. However, it is not clear to me how this problem could be addressed, and I'm not aware of any API (any language) that addresses it. Typically, people need things like this: - in a fixed-width terminal, what characters occupy what column. Notice that this involves East-Asian wide characters, where a single Unicode character (a "wide" character) occupies two columns. OTOH, with combining characters, a sequence of characters might be associated with a single column. Furthermore, some code points might not be associated with a column at all. - for a given font, how many points does a string occupy, horizontally and vertically. - where is the next word break I don't know what your application is, but I somewhat doubt it is as simple as "give me a thing describing the nth character, including combining diacritics". However, it is certainly possible to implement libraries on top of the existing code, and if there is a real need for that, somebody will contribute it.

...

normal _and_ decomposed forms,

Terminology alert: the are multiple normal forms in Unicode, and some of them are decomposed (e.g. NFD, NFKD). I fail to see a problem with that. There are applications for all normal forms, and many applications don't need the overhead of normalization. It might be that the code for your languages becomes simpler when always assuming NFC, but this hardly holds for all languages, or all applications.

...

directional marks, linguistic marks and various other such complexities.

Same comment as above: if this becomes a real problem, people will contribute code to deal with it.

...

But in our case, where applications already work in Latin-1, abusing our Unicode luck, UTF-8 may _not_ be used as is, we ought to use Unicode or wide strings as well, for preserving S[N] addressability. So changing source encodings may be intimately tied to going Unicode whenever UTF-8 (or any other variable-length encoding) gets into the picture.

Yes. There is not much Python can do about this. UTF-8 is very nice for transfer of character data, but it does have most of the problems of a multi-byte encoding. I still prefer it over UTF-16 or UTF-32 for transfer, though.

...

I hope that my explanation above helps at seeing that source encoding and choice of string literals are not as independent as one may think.

It really depends on your processing needs. But yes, my advise still stands: convert to Unicode objects as early as possible in the processing. For source code involving non-ASCII characters, this means you really should use Unicode literals. Of course, my other advise also applies: if you have a program that deals with multiple languages, use only ASCII in the source, and use gettext for the messages.

...

There ought to be a way to maintain a single Python source that would work dependably through re-encoding of the source, but not uselessly relying on wide strings when there is no need for them. That is, without marking all literal strings as being Unicode. Changing encoding from ISO 8859-1 to UTF-8 should not be a one-way, no-return ticket.

But it is not: as you say, you have to add u prefixes when going to UTF-8, yes. But then you can go back to Latin-1, with *no* change other than recoding, and changing the encoding declaration. The string literals can all stay as Unicode literals - the conversion to Latin-1 then really has *no* effect on the runtime semantics.

...

Of course, it is very normal that sources may have to be adapted for the possibility of a Unicode context. There should be some good style and habits for writing re-encodable programs. So this exchange of thoughts.

If that is the goal, you really need Unicode literals - everything else *will* break under re-encoding. Regards, Martin

M.-A. Lemburg

9 a.m.

[Source code encoding and string literals]

...

I hope that my explanation above helps at seeing that source encoding and choice of string literals are not as independent as one may think. A choice that I surely do _not_ have is to see bugs appear in programs merely because I changed the source encoding. Going from ISO 8859-1 to ISO 8859-15 for a Python source is probably fairly safe, because there is no need for switching the narrowness of strings. Going from ISO 8859-1 to UTF-8 is very unsafe, and editing all literal strings from narrow to wide, using `u' prefixes, becomes almost unavoidable.

Indeed. As always: explicit is better than implicit :-) The small "u" in front of the literal will tell all readers: this is Unicode text. We might introduce more ways to switch string literal interpretation depending on module or interpreter process scope. However, the small u is here to stay and it's available now, so why not use it ? Your concerns about programs breaking because of changes to the source encoding are valid, but not something that Python can address. You have the same problem with normal text documents: a spell checker might find wrong spellings of a word as a result of using a wrong encoding, but it is not fool proof and things get worse if you have multiple languages embedded in your program code. As general advice for writing i18n compliant programs, I can only suggest to keep programs written using a single source encoding and language that appeals to the programmer and place *all* string literals under gettext or similar tool control. I usually write programs in ASCII or Latin-1 and use English to write the string literals which then get mapped to user languages as necessary by means of gettext or custom translation logic. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 05 2004)

...

...
...
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

7512

Age (days ago)

7515

Last active (days ago)

List overview

Download

35 comments

12 participants

participants (12)

"Martin v. Löwis"
Aahz
François Pinard
Greg Ewing
Guido van Rossum
Holger Krekel
M.-A. Lemburg
Michael Hudson
Paul Prescod
Phillip J. Eby
Skip Montanaro
Tim Peters

Python in Unicode context

François Pinard

François Pinard

Michael Hudson

Michael Hudson

Michael Hudson

Michael Hudson

Michael Hudson

Michael Hudson

Michael Hudson

Michael Hudson

Holger Krekel

Holger Krekel

Holger Krekel

Michael Hudson

François Pinard

tags

participants (12)