Re: [Python-ideas] Support WHATWG versions of legacy encodings

Jan. 11, 2018


      On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote:
...
The way of solving this issue in Python is using an error handler. The 
"surrogateescape" error handler is specially designed for lossless 
reversible decoding. It maps every unassigned byte in the range 
0x80-0xff to a single character in the range U+dc80-U+dcff. This allows 
you to distinguish correctly decoded characters from the escaped bytes, 
perform character by character processing of the decoded text, and 
encode the result back with the same encoding.
Maybe we need a new error handler that maps unassigned bytes in the range 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the encodings being discussed have behavior other than the "normal" version of the encoding plus what I just described?

Re: [Python-ideas] Support WHATWG versions of legacy encodings

Random832