[issue41330] Inefficient error-handle for CJK encodings

Ma Lin report at bugs.python.org
Sat Jul 18 04:31:13 EDT 2020


Ma Lin <malincns at 163.com> added the comment:

> But how many new Python web application use CJK codec instead of UTF-8?

A CJK character usually takes 2-bytes in CJK encodings, but takes 3-bytes in UTF-8.

I tested a Chinese book:
in GBK:     853,025 bytes
in UTF-8: 1,267,523 bytes

For CJK content, UTF-8 is wasteful, maybe CJK encodings will not be eliminated.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41330>
_______________________________________


More information about the Python-bugs-list mailing list