[Python-ideas] Support WHATWG versions of legacy encodings
Random832
random832 at fastmail.com
Thu Jan 11 11:42:43 EST 2018
On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote:
> The way of solving this issue in Python is using an error handler. The
> "surrogateescape" error handler is specially designed for lossless
> reversible decoding. It maps every unassigned byte in the range
> 0x80-0xff to a single character in the range U+dc80-U+dcff. This allows
> you to distinguish correctly decoded characters from the escaped bytes,
> perform character by character processing of the decoded text, and
> encode the result back with the same encoding.
Maybe we need a new error handler that maps unassigned bytes in the range 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the encodings being discussed have behavior other than the "normal" version of the encoding plus what I just described?
More information about the Python-ideas
mailing list