[issue24110] zipfile.ZipFile.write() does not accept bytes arcname
New submission from July Tikhonov: In documentation of zipfile.ZipFile.write() there is following notice: "There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write()." I understand it as that 'arcname' argument to write() shouldn't be of type str, but rather bytes. But it is str that works, and bytes that does not: $ ./python Python 3.5.0a4+ (default:6f6e78931875, May 1 2015, 23:18:40) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information.
import zipfile zf = zipfile.ZipFile('foo.zip', 'w') zf.write('python', 'a') zf.write('python', b'b') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/july/source/python/Lib/zipfile.py", line 1442, in write zinfo = ZipInfo(arcname, date_time) File "/home/july/source/python/Lib/zipfile.py", line 322, in __init__ null_byte = filename.find(chr(0)) TypeError: a bytes-like object is required, not 'str'
(ZipInfo ostensibly attempts to find a zero byte in the filename, but searches instead for a unicode character chr(0). There are several other places in ZipInfo class that assume filename being str rather than bytes.) I consider this a documentation issue: the notice is misleading. Although maybe there is someone who wants to fix the behavior of ZipInfo to allow bytes filename. ---------- assignee: docs@python components: Documentation messages: 242355 nosy: docs@python, july priority: normal severity: normal status: open title: zipfile.ZipFile.write() does not accept bytes arcname type: behavior versions: Python 3.4, Python 3.5, Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
Changes by July Tikhonov <july.tikh@gmail.com>: ---------- components: +Library (Lib) _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
Stéphane Wirtel added the comment: This documentation is correct for python2 but maybe not for python3. To check. ---------- nosy: +matrixise _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
R. David Murray added the comment: We should either make it work with byte filenames, or allow control of the filename encoding. See also issue 20329. Unfortunately that part is probably a new feature. In the meantime the docs should be fixed: I believe we automatically encode the filename using the default zip filename codec (but someone should check). ---------- nosy: +r.david.murray _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
Serhiy Storchaka added the comment: Indeed, the note is outdated and incorrect. First, general unicode filename are allowed. They are encoded with UTF-8 internally. Second, currently there is no way to create an entry without encoding the filename to UTF-8 (if it is not ASCII-only). So you can't create ZIP file with arbitrary encoding (e.g. cp866) for old DOS/Windows unzippers. Adding support of bytes filenames is different issue (issue10757). ---------- nosy: +serhiy.storchaka stage: -> needs patch versions: -Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
R. David Murray added the comment: Ah, I *thought* there was an issue for that, but I didn't find it when I searched. So this is just a doc issue to fix the docs to reflect current reality. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
Patrik Dufresne added the comment: I'm converting my project into python3. I'm encountering issue with zipfile encoding. Look like, it only support unicode path. This is a huge issue since path are, by definition, bytes. You may store a file name with an invalid character without issue on the filesystem. As such, arcname should support bytes. Like, Tar, zip file format doesn't define a specific encoding. You may store filename as bytes. ---------- nosy: +Patrik Dufresne _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
R. David Murray added the comment: As noted, adding that support is the subject of issue 10757. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
Patrik Dufresne added the comment: Manage to work around this issue by using surrogateescape for arcname and filename. For me it's no longer an issue. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24110> _______________________________________
Irit Katriel <iritkatriel@yahoo.com> added the comment: That part of the documentation was updated here by Serhiy: https://github.com/python/cpython/pull/10592 ---------- nosy: +iritkatriel _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue24110> _______________________________________
participants (6)
-
Irit Katriel
-
July Tikhonov
-
Patrik Dufresne
-
R. David Murray
-
Serhiy Storchaka
-
Stéphane Wirtel