[New-bugs-announce] [issue4757] reject unicode in zlib
STINNER Victor
report at bugs.python.org
Sat Dec 27 13:58:18 CET 2008
New submission from STINNER Victor <victor.stinner at haypocalc.com>:
Python 2.x allows to encode any byte string (str) and ASCII unicode
string (unicode):
$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> import zlib
>>> zlib.compress('abc')
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(u'abc')
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(u'abc\xe9')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ...
I'm not sure that this behaviour was really wanted become the
decompress operation is not symetric (the result type is always byte
string):
$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> import zlib
>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'")
'abc'
---
Python 3.0 accepts any string: bytes or characters. But decompress
always produce bytes string:
$ ./python
Python 3.1a0 (py3k:67926M, Dec 26 2008, 23:59:07)
>>> import zlib
>>> zlib.compress(b'abc')
b"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress('abc')
b"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress('abc\xe9')
b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93'
>>> zlib.compress('abc\xe9'.encode('utf-8'))
b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93'
>>> zlib.decompress(b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93')
b'abc\xc3\xa9'
The most strange operation is the decompression of an unicode string:
$ ./python
>>> zlib.decompress('x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93')
...
zlib.error: Error -3 while decompressing data: incorrect header check
---
I propose to change zlib API to reject unicode string and use explicit
conversion to/from bytes. Functions/methods:
- compress(bytes, ...)
- decompress(bytes, ...)
- <compress object>.compress(bytes, ...)
- <decompress object>.decompress(bytes, ...)
- crc32(bytes, value=0)
- adler(bytes, value=1)
Note: binascii.crc32() already rejects unicode string.
The behaviour may kept in Python 3.0.x and only changed in Python 3.1.
----------
components: Extension Modules
files: zlib_bytes.patch
keywords: patch
messages: 78356
nosy: haypo
severity: normal
status: open
title: reject unicode in zlib
type: behavior
versions: Python 3.0, Python 3.1
Added file: http://bugs.python.org/file12472/zlib_bytes.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4757>
_______________________________________
More information about the New-bugs-announce
mailing list