Customizing character set conversions with an error handler
Serge.Orlov at gmail.com
Sun Mar 12 22:33:49 CET 2006
Jukka Aho wrote:
> When converting Unicode strings to legacy character encodings, it is
> possible to register a custom error handler that will catch and process
> all code points that do not have a direct equivalent in the target
> encoding (as described in PEP 293).
> The thing to note here is that the error handler itself is required to
> return the substitutions as Unicode strings - not as the target encoding
> bytestrings. Some lower-level gadgetry will silently convert these
> strings to the target encoding.
> That is, if the substitution _itself_ doesn't contain illegal code
> points for the target encoding.
> Which brings us to the point: if my error handler for some reason
> returns illegal substitutions (from the viewpoint of the target
> encoding), how can I catch _these_ errors and make things good again?
> I thought it would work automatically, by calling the error handler as
> many times as necessary, and letting it work out the situation, but it
> apparently doesn't. Sample code follows:
> # So the question becomes: how can I make this work
> # in a graceful manner?
change the return statement with this code:
More information about the Python-list