[New-bugs-announce] [issue30003] Remove hz codec

Ma Lin report at bugs.python.org
Wed Apr 5 23:42:17 EDT 2017


New submission from Ma Lin:

hz is a Simplified Chinese codec, available in Python since around 2004.

However, hz encoder has a serious bug, it forgets to escape ~
>>> 'hi~'.encode('hz')
b'hi~'    # the correct output should be b'hi~~'

As a result, we can't finish a roundtrip:
>>> b'hi~'.decode('hz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'hz' codec can't decode byte 0x7e in position 2: incomplete multibyte

In these years, no one has reported this bug, so I think it's pretty safe to remove hz codec.

FYI:
HZ codec is a 7-bit wrapper for GB2312, was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee, and subsequently codified in 1995 into RFC 1843.

It was popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.

https://en.wikipedia.org/wiki/HZ_(character_encoding)

Does other languages have hz codec?
Java 8: no [1]
.NET: yes [2]
PHP: yes [3]
Perl: yes [4]

[1] http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
[2] https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
[3] http://php.net/manual/en/mbstring.supported-encodings.php
[4] http://perldoc.perl.org/Encode/CN.html

----------
components: Unicode
messages: 291207
nosy: Ma Lin, ezio.melotti, haypo, xiang.zhang
priority: normal
severity: normal
status: open
title: Remove hz codec
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30003>
_______________________________________


More information about the New-bugs-announce mailing list