[Python-ideas] Support WHATWG versions of legacy encodings

Random832 random832 at fastmail.com
Thu Jan 11 11:42:43 EST 2018


On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote:
> The way of solving this issue in Python is using an error handler. The 
> "surrogateescape" error handler is specially designed for lossless 
> reversible decoding. It maps every unassigned byte in the range 
> 0x80-0xff to a single character in the range U+dc80-U+dcff. This allows 
> you to distinguish correctly decoded characters from the escaped bytes, 
> perform character by character processing of the decoded text, and 
> encode the result back with the same encoding.

Maybe we need a new error handler that maps unassigned bytes in the range 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the encodings being discussed have behavior other than the "normal" version of the encoding plus what I just described?


More information about the Python-ideas mailing list