[Python-Dev] zipfile and unicode filenames

"Martin v. Löwis" martin at v.loewis.de
Sun Jun 10 05:36:39 CEST 2007


> Today I've stumbled upon a bug in my program that wasn't very
> straightforward to understand. 

Unfortunately, it isn't straight-forward to understand your
description of it, either.

> The problem is that I was passing
> unicode filenames to zipfile.ZipFile.write and I had
> sys.setdefaultencoding() in effect

What do you mean here? How can sys.setdefaultencoding()
be "in effect"? There is always a default encoding; did
you mean you changed the default?

> which resulted in a situation
> where most of the bytes generated in zipfile.ZipInfo.FileHeader would
> pass thru, except for a few, which caused codec error on another
> machine (where filenames got infectiously upgraded to unicode).

Was the problem that most of the bytes would pass thru, or was
the problem that a few did not pass thru? Why did filenames in
the FileHeader infectiously upgraded to unicode on the other
machine, but not on the first machine?

> The
> problem here is that it was absolutely unclear at first that I get
> unicode filenames passed to write, and it incorrectly accepted them
> silently. Is it worth to submit a bug report on this?

Try to let me rephrase what I understood so far:

"I changed the default system encoding from ASCII to some other
value, and that caused zipfile.py to generate an incorrect
zipfile. Is that a bug in zipfile?"

To that, the answer is a clear "no". If you change the default
encoding, you are on your own. Don't do that.

> So, should I submit a bug report, and which behavior would be actually correct?

The issue of non-ASCII file names in zipfiles is fairly well understood.
The ZIP format historically did not support them well. I believe this
has recently been improved, but that format change has not propagated
into the zipfile module, yet. Howeer, everybody is aware of the
situation, so there is no need to report a bug.

Regards,
Martin


More information about the Python-Dev mailing list