[New-bugs-announce] [issue20329] zipfile.extractall fails in Posix shell with utf-8 filename

Laurent Mazuel report at bugs.python.org
Tue Jan 21 16:05:53 CET 2014

New submission from Laurent Mazuel:


Considering a zip file which contains utf-8 filenames (as uploaded zip file), the following code fails if launched in a Posix shell.

>>> with zipfile.ZipFile("test_ut8.zip") as fd:
...     fd.extractall()
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1225, in extractall
    self.extract(zipinfo, path, pwd)
  File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1213, in extract
    return self._extract_member(member, path, pwd)
  File "/opt/python/3.3/lib/python3.3/zipfile.py", line 1276, in _extract_member
    open(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-14: ordinal not in range(128)

With shell:
$ locale

But filesystem is not encoding dependant. On a Unix system, filename are only bytes, there is no reason to refuse to unzip a zip file (in fact, "unzip" command line don't fail to unzip the file in a Posix shell).

Since "open" can take "bytes" filename, changing the line 1276 from
> open(targetpath)
> open(targetpath.encode("utf-8"))

fixes the problem.

zipfile should not care about the encoding of the filename and should use the bytes sequence filename extracted directly from the bytes sequence of the zipfile. Having "ZipInfo.filename" as a string (and not bytes) is great for an API, but is not needed to open/write a file on the disk. Then, ZipInfo should store the direct bytes sequences of filename as a "bytes_filename" field and use it in the "open" of "extract".

In addition, considering the patch of bug 10614, the right patch could use the new "ZipInfo.encoding" field:
> open(targetpath.encode(member.encoding))

components: Extension Modules
files: test_ut8.zip
messages: 208648
nosy: Laurent.Mazuel
priority: normal
severity: normal
status: open
title: zipfile.extractall fails in Posix shell with utf-8 filename
type: behavior
versions: Python 3.3
Added file: http://bugs.python.org/file33589/test_ut8.zip

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list