[Python-Dev] Python-3.0, unicode, and os.environ
Adam Olsen
rhamph at gmail.com
Mon Dec 8 22:32:00 CET 2008
On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-12-08 21:45, Antoine Pitrou wrote:
>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>> Such application specific error handlers could then also apply
>>> whatever fancy round-trip safe encoding of non-decodable bytes
>>> to Unicode escapes, private code points, etc. as seen fit by the
>>> application.
>>
>> I'd argue that such fancy round-trip safe error handler should be provided by
>> Python. It's not reasonable to expect application coders to come up with their
>> own codec variation based on subtle details of the unicode spec.
>
> Fair enough. We could add some e.g.
>
> * a round-trip safe escape error handler that uses a Unicode private
> code point area which we officially reserve for the Python
> interpreter
This would of course alter the behaviour of those private code points,
preventing them from round-tripping properly.
I don't think round-tripping can be done from an error handler. You
need a full codec to do it. A simple option is 8859-1. Or, ya know,
bytes. This has long since gotten repetitive..
> * a human readable escape error handler that encodes the problem
> bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
> encoded directory name instead of failing
Similar to 'ö'.encode('ascii', 'backslashreplace')? I'm +1 on making that work.
> * a warning error handler that replaces the problem cases with
> a question mark and issues a warning through the warning
> framework
I dub thee errors='warnreplace'.
--
Adam Olsen, aka Rhamphoryncus
More information about the Python-Dev
mailing list