[Python-Dev] Relaxing Unicode error handling

M.-A. Lemburg mal at egenix.com
Tue Jan 6 18:49:53 EST 2004

Martin v. Loewis wrote:
> M.-A. Lemburg wrote:
>> So you are only talking about the case where the application
>> uses the standard default encoding (ASCII) and does not
>> make use of any other codecs ?
> Yes, I'd like to change the error handling in the case of
> an implicit conversion - in particular for the unicode-to-ascii
> case, but also for the (non-)ascii to unicode case.
> Application authors *think* they got rid of all non-trivial
> instances of such conversions, only to find out that their
> customers can produce endless series of application crashes
> by entering funny characters in all imaginable places.

Hmm, I would assume that application authors do have the
proper permissions to use the work-around I mentioned
below, that is, add sitecustomize.py to their application
as top-level module and change the default encoding to
use their custom codec instead. Even better: that codec
could also write out a warning log together with traceback
to make debugging for the application developers easier.

>>> Sure, but they would also not be the default codecs.
>> Why not ? What about Asian users who set the default encoding
>> to one of the encodings supported by e.g. the JapaneseCodecs
>> package ?
> If that was a proper Python feature, I'm sure the cjkcodecs
> would support it instantly. However, perhaps people also take
> our advise and avoid changing the default encoding (because that
> *doesn't* work); so even in applications where cjkcodecs are
> heavily used, I would hope that the system encoding remains
> at us-ascii. But if it doesn't, cjkcodecs should also implement
> the change I'm proposing. This is a red herring.
>> Oh, sorry, that's the term I use for PyArg_ParseTuple() format
>> arguments.
> Ah, right: They should make use of the default error handling
> as well.

They currently hard code "strict" for error handling and
I don't want to change that (because codecs might not default
to "strict" in case the default error handling for the codec
is chose).

Then again, if you only change the ASCII codec, you wouldn't
have to change these hard coded "strict" values.

>> Given the scenario you mention above, wouldn't that also be
>> possible by providing a customized codec for "ascii" under
>> a new name "all-things-ascii" and then setting the default
>> encoding to "all-things-ascii" ?
> No. Changing the system default encoding is not possible for
> applications - it is the system administrator that needs
> to make this change (in site.py). I'm proposing a change that
> applications can make at run-time.

See above: All you have to do is let Python pick up a sitecustomize
module included in application's Python path. You don't need
to be root to accomplish that and I would assume that an
application developer knows how to implement this work-around.

We could even provide such a codec as standard Python encoding,
if that's too much hassle for the application developer, something
like 'ascii-debug'.

>> Just think of the issues this could cause in multi-user systems
>> such as Zope that are not prepared for these changes:
>> a script could easily change the settings to have
>> the server execute code under different user ids if threads
>> executing their requests generate codec errors (the errors
>> parameter can be set to a callback now that we have the new
>> logic in place...).
> This is also a red herring. Zope can give very controlled
> access to builtins, and could just dis-allow scripts to change
> the setting - Zope applications would need to find a different
> way.

Zope was just an example of such an application. The problem
is that you would be creating a possible vulnerability
that you'd have to teach such applications first in order
to protect them against it.

Of course, you could restrict the global error handling
default for the ASCII codec to only accept strings as

> Remember, it is a work-around - so it clearly has limitations.
> I'm proposing it anyway, and I'm fully aware of the limitations.

And I'm trying to convince you that there are other ways
to achieve the same thing without introducing yet more
of these application scope globals :-)

Boiling down to what we've found, I think you only need to
add a switch to turn the ASCII codec errors arguments
"strict" and NULL into behaving like "replace".

I'd be +0 on that :-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jan 06 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

More information about the Python-Dev mailing list