On Thu, 11 Jan 2018 at 11:43 Random832 <random832@fastmail.com> wrote:
Maybe we need a new error handler that maps unassigned bytes in the range 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the encodings being discussed have behavior other than the "normal" version of the encoding plus what I just described?

(accidentally replied individually instead of replaying all)

There is one more difference I have found between Python's encodings and WHATWG's. In Python's codepage 1255, b'\xca' is undefined. In WHATWG's, it maps to U+05BA HEBREW POINT HOLAM HASER FOR VAV. I haven't tracked down what the Unicode Consortium has to say about this.

Other than that, all the differences are adding the fall-throughs in the range U+0080 to U+009F. For example, elsewhere in windows-1255, the byte b'\xff' is undefined, and it remains undefined in WHATWG's mapping.