[Python-Dev] PEP 383 update: utf8b is now the error handler

MRAB google at mrabarnett.plus.com
Wed May 6 12:08:45 CEST 2009


M.-A. Lemburg wrote:
> Martin v. Löwis wrote:
>>> The name "utf8b" suggested in the PEP is not in line with the codec
>>> design
>> Where is that design documented, and how exactly violates the name
>> the design (chapter and verse, please).
> 
> Martin, I designed the whole Python codec machinery, so even if
> this is not explicitly written down somewhere, you can take my
> word for it.
> 
> I don't want users to be confused by such an error handler
> name, so please change it !
> 
> Here's a list of the currently available error handlers (taken from
> codecs.py):
> 
>         The .encode()/.decode() methods may use different error
>         handling schemes by providing the errors argument. These
>         string values are predefined:
> 
>          'strict' - raise a ValueError error (or a subclass)
>          'ignore' - ignore the character and continue with the next
>          'replace' - replace with a suitable replacement character;
>                     Python will use the official U+FFFD REPLACEMENT
>                     CHARACTER for the builtin Unicode codecs on
>                     decoding and '?' on encoding.
>          'xmlcharrefreplace' - Replace with the appropriate XML
>                                character reference (only for encoding).
>          'backslashreplace'  - Replace with backslashed escape sequences
>                                (only for encoding).
> 
>         The set of allowed values can be extended via register_error.
> 
>>> Error handlers and codecs are two different things, so the namespaces
>>> need to be clearly separate.
>> They *are* separate naemspaces; that's guaranteed by the implementation.
> 
> In the implementation, yes, but not in the head of a typical user:
> the 'utf8b' looks more like a codec name than an error handler
> name.
> 
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of substitute, it's not too
long, and the codes which act as replacements are already called
surrogates.

> I want to avoid any such confusion with Python codecs and don't
> understand why you are making a problem out of this.
> 



More information about the Python-Dev mailing list