[Python-ideas] Support WHATWG versions of legacy encodings

Rob Speer rspeer at luminoso.com
Thu Jan 11 14:55:07 EST 2018

On Thu, 11 Jan 2018 at 11:43 Random832 <random832 at fastmail.com> wrote:

> Maybe we need a new error handler that maps unassigned bytes in the range
> 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the
> encodings being discussed have behavior other than the "normal" version of
> the encoding plus what I just described?

(accidentally replied individually instead of replaying all)

There is one more difference I have found between Python's encodings and
WHATWG's. In Python's codepage 1255, b'\xca' is undefined. In WHATWG's, it
maps to U+05BA HEBREW POINT HOLAM HASER FOR VAV. I haven't tracked down
what the Unicode Consortium has to say about this.

Other than that, all the differences are adding the fall-throughs in the
range U+0080 to U+009F. For example, elsewhere in windows-1255, the byte
b'\xff' is undefined, and it remains undefined in WHATWG's mapping.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180111/ebb2da2f/attachment.html>

More information about the Python-ideas mailing list