[Distutils] The Wheel specification and Unicode filenames

Vinay Sajip vinay_sajip at yahoo.co.uk
Thu Feb 21 16:13:05 CET 2013


The Wheel specification talks about supporting Unicode in the filename of wheel
files, but is mute on the subject of the names of the entries in the archive.

It would be good to have clarity on this point. The Python docs for 2.x and 3.x
tell us:

    There is no official file name encoding for ZIP files. If you have unicode
    file names, you must convert them to byte strings in your desired encoding
    before passing them to write(). WinZip interprets all file names as encoded
    in CP437, also known as DOS Latin.

The "your desired encoding" is, I think, too loose for wheel files, as we want
interoperability between implementations. We should mandate CP437 encoding if we
want the files to be examinable on Windows in e.g. WinZip or 7-Zip. On Linux,
file-roller seems to be unable to display Unicode, whether you use CP437 for the
filenames or whether you use utf-8.

Regards,

Vinay Sajip



More information about the Distutils-SIG mailing list