[New-bugs-announce] [issue4757] reject unicode in zlib

STINNER Victor report at bugs.python.org
Sat Dec 27 13:58:18 CET 2008


New submission from STINNER Victor <victor.stinner at haypocalc.com>:

Python 2.x allows to encode any byte string (str) and ASCII unicode 
string (unicode):

$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> import zlib
>>> zlib.compress('abc')
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(u'abc')
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(u'abc\xe9')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ...

I'm not sure that this behaviour was really wanted become the 
decompress operation is not symetric (the result type is always byte 
string):

$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> import zlib
>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'")
'abc'

---

Python 3.0 accepts any string: bytes or characters. But decompress 
always produce bytes string:

$ ./python
Python 3.1a0 (py3k:67926M, Dec 26 2008, 23:59:07)
>>> import zlib
>>> zlib.compress(b'abc')
b"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress('abc')
b"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress('abc\xe9')
b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93'
>>> zlib.compress('abc\xe9'.encode('utf-8'))
b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93'
>>> zlib.decompress(b'x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93')
b'abc\xc3\xa9'

The most strange operation is the decompression of an unicode string:

$ ./python
>>> zlib.decompress('x\x9cKLJ>\xbc\x12\x00\x06\xca\x02\x93')
...
zlib.error: Error -3 while decompressing data: incorrect header check

---

I propose to change zlib API to reject unicode string and use explicit 
conversion to/from bytes. Functions/methods:
 - compress(bytes, ...)
 - decompress(bytes, ...)
 - <compress object>.compress(bytes, ...)
 - <decompress object>.decompress(bytes, ...)
 - crc32(bytes, value=0)
 - adler(bytes, value=1)

Note: binascii.crc32() already rejects unicode string.

The behaviour may kept in Python 3.0.x and only changed in Python 3.1.

----------
components: Extension Modules
files: zlib_bytes.patch
keywords: patch
messages: 78356
nosy: haypo
severity: normal
status: open
title: reject unicode in zlib
type: behavior
versions: Python 3.0, Python 3.1
Added file: http://bugs.python.org/file12472/zlib_bytes.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4757>
_______________________________________


More information about the New-bugs-announce mailing list