[Python-Dev] non-US zip archives support in zipfile.py

Toshio Kuratomi a.badger at gmail.com
Thu Oct 17 23:46:26 CEST 2013


On Tue, Oct 15, 2013 at 03:46:15PM +0200, "Martin v. Löwis" wrote:
> Am 15.10.13 14:49, schrieb Daniel Holth:
> > It is part of the ZIP specification. CP437 or UTF-8 are the two
> > official choices, but other encodings happen on Russian, Japanese
> > systems.
> 
> Indeed. Formally, the other encodings are not supported by the
> ZIP specification, and are thus formally misuse of the format.
> 
<nod>  But the tools in the wild misuse the format in this manner.
CP437 can encode any byte so zip and unzip on Linux, for instance, take the
bytes that represent the filename on the filesystem and use those in the zip
file without setting the utf-8 flag.  When the files are extracted, the same
byte sequence are used as the filename for the new files.

> I believe (without having proof) that early versions of the
> specification failed to discuss the file name encoding at all,
>
These might be helpful:

No mention of file name encodings in this version of the spec:
http://www.pkware.com/documents/APPNOTE/APPNOTE-6.2.2.TXT

Appendix D, Language Encoding, shows up here:
http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT

(Most recent version is 6.3.2)

> making people believe that it is unspecified and always the
> system encoding (which is useless, of course, as you create
> zip files to move them across systems).
>
Not always.  Backups are another use.  Also it's not useless.  If the files
are being moved within an organization (or in some cases geographical
regions have standardized on an encoding in practice), the same system
encoding could very well be in use on the machines where the files end up.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20131017/3a7c3cdf/attachment.sig>


More information about the Python-Dev mailing list