[Python-Dev] zipfile and unicode filenames

"Martin v. Löwis" martin at v.loewis.de
Sun Jun 10 21:43:19 CEST 2007

> So the general idea is that at least directory filename has some sort
> of convention of using oem (dos, console) encoding on Windows, cp866
> in my case. Header filenames have different encodings, and seem to be
> ignored.

Ok, then this is what the zipfile module should implement.

>> That would be incorrect, as it relies on the system encoding,
>> which shouldn't be relied upon.
> Well, as I've seen in numerous examples above, system (or actually
> dos) encoding is actually what is used by at least by three major
> programs: 7-zip, pkzip25 and explorer, at least on windows.

Please don't confuse Python's "system encoding" with the system's
(or user's) standard encoding - they are not related at all. Using
the OEM code page if everybody else does it is fine. Using the
encoding that somebody hand-coded into the Python installation
is not.

>> Plus, it would allow arbitrary
>> non-string things as filenames.
> Hmm... why is that bad?

Errors should never pass silently.

>> What it should do instead
>> (IMO) is to encode in CP437. Bonus points if it falls back
>> to the UTF-8 feature of zip files if encoding as CP437 fails.
> And encoding to cp437 would be incorrect, as no currently existing
> program would correctly work on non-english Windows OSes. I think that
> letting the user deciding on the encoding is the right way to go here,
> as you can't know what user actually wants these days, it's all too
> hazy to me. 

Asking "the user" is not practical. If "the user" was aware of the
problem, you would not have run into the problem in the first place -
you would have known to encode all file names before passing them
into the zipfile module.

The automatic mode should follow the standard or the conventions;
"the user" (in quotes, because the end user is rarely bothered
with that detail) can still override that explicitly.


More information about the Python-Dev mailing list