[Distutils] The Wheel specification and Unicode filenames

Daniel Holth dholth at gmail.com
Thu Feb 21 16:27:12 CET 2013

On Thu, Feb 21, 2013 at 10:22 AM, Daniel Holth <dholth at gmail.com> wrote:

> On Thu, Feb 21, 2013 at 10:13 AM, Vinay Sajip <vinay_sajip at yahoo.co.uk>wrote:
>> The Wheel specification talks about supporting Unicode in the filename of
>> wheel
>> files, but is mute on the subject of the names of the entries in the
>> archive.
>> It would be good to have clarity on this point. The Python docs for 2.x
>> and 3.x
>> tell us:
>>     There is no official file name encoding for ZIP files. If you have
>> unicode
>>     file names, you must convert them to byte strings in your desired
>> encoding
>>     before passing them to write(). WinZip interprets all file names as
>> encoded
>>     in CP437, also known as DOS Latin.
>> The "your desired encoding" is, I think, too loose for wheel files, as we
>> want
>> interoperability between implementations. We should mandate CP437
>> encoding if we
>> want the files to be examinable on Windows in e.g. WinZip or 7-Zip. On
>> Linux,
>> file-roller seems to be unable to display Unicode, whether you use CP437
>> for the
>> filenames or whether you use utf-8.
> I feign ignorance of any coding that is not utf-8.
> http://hg.python.org/cpython/file/d49685548a7a/Lib/zipfile.py#l404
> http://hg.python.org/cpython/file/d49685548a7a/Lib/zipfile.py#l1000

I will clarify the spec to include utf-8 as the filename encoding. The zip
format allows it (set general purpose bit 11) but a lot of programs do not
understand it. Python's zipfile supports utf-8 in zip.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130221/cf322b9a/attachment.html>

More information about the Distutils-SIG mailing list