[issue24117] Wrong range checking in GB18030 decoder.

Ma Lin report at bugs.python.org
Tue May 19 08:28:24 CEST 2015


Ma Lin added the comment:

>> If you could provide links to the relevant pages/section we can verify that the codecs are indeed incorrect. 

Here is CP950, 0xC6A1 is not in it.
https://msdn.microsoft.com/zh-cn/goglobal/cc305155

I can provide one link, but there are many variants of BIG5 convert table on the Interenet, so one link doesn't bring persuasion.

In this page: https://moztw.org/docs/big5/
Listed many variants of BIG5 tables, I found 0xC6A1<->U+30FE in this table "Unicode 1.1", the description of it is "it's a terrible table, many errors exist, sadlly many foreigners are using it", but IIRC Python's BIG5 codec is not fully same as that table.

IMO, the most reliable way is reading a lot of stuff, and verify the key points and conflicts with authoritative source, but this way is very hard for foreigners.

Anyway, let's wait Taiwanese and their opinion for whether this should be fixed.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24117>
_______________________________________


More information about the Python-bugs-list mailing list