[I18n-sig] Re: validity of lone surrogates
Walter Dörwald
walter@livinglogic.de
Wed, 27 Jun 2001 18:56:00 +0200
Guido van Rossum wrote:
>
> [Gaute]
> > My take on this is that the various UTF codecs should follow the spec=
s
> > to the letter and reject antything else in default mode. There shoul=
d
> > also be a "lenient" or "forgiving" mode in which the codec does its
> > best to interpret and repair broken, nonsensical or irregular data.
> > Off course, if an application uses this mode then it will have to be
> > aware of the dangers involved, including the security aspects.
>
> Python's codec mechanism has a nice API gimmick: you can pass an error
> handling option. Currently, this can be 'strict', 'ignore', or
> 'replace'. I wonder if we could add a fourth mode, 'lenient', that
> tries its best to encode anything passed in?
How would this work together with the proposed encode error handling
callback feature (see patch #432401)? Does this patch have any change of
getting into Python (when it's finished)?
Bye,
Walter Dörwald