[ python-Bugs-878120 ] Zipfile archive name can't be unicode

SourceForge.net noreply at sourceforge.net
Thu Apr 6 12:53:48 CEST 2006


Bugs item #878120, was opened at 2004-01-16 09:32
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=878120&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Extension Modules
Group: Python 2.3
Status: Deleted
Resolution: None
Priority: 5
Submitted By: Simon Harrison (ssmmhh)
Assigned to: Nobody/Anonymous (nobody)
Summary: Zipfile archive name can't be unicode

Initial Comment:
In Python 2.3.2, The following code:

import zipfile
z = zipfile.ZipFile( "file.zip", "w" )
z.write( "file.txt", u"file.txt" )
z.close()

Results in this exception:

Traceback (most recent call last):
  File "E:\dev\ziptest.py", line 8, in ?
    z.write( "file.txt", u"file.txt" )
  File "D:\Python23\lib\zipfile.py", line 412, in write
    self.fp.write(zinfo.FileHeader())
  File "D:\Python23\lib\zipfile.py", line 166, in FileHeader
    return header + self.filename + self.extra
UnicodeDecodeError: 'ascii' codec can't decode byte 
0xd5 in position 10: ordinal
 not in range(128)

The code could be fixed in zipfile.py.

Something along the lines of:
return header + self.filename.encode("utf-8") + self.extra

On Windows ideally the code should figure out the 
current locale's codepage and use that to encode the 
filename into the correct multibyte sequence.

The example above is pretty easy to spot, but if the 
arcname is coming from a COM property (my case) it 
takes a while to figure out why zipfile is crashing!

This is bug 705295 resubmitted:

https://sourceforge.net/tracker/?
func=detail&atid=105470&aid=705295&group_id=5470 





----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2006-04-06 10:53

Message:
Logged In: YES 
user_id=849994

There's no specs that say which encoding the file names
should have. That WinZip uses cp437 is its own choice. Other
sources tell "Zip programs by default assume the filenames
are encoded using the code page of the machine", so it's
better to leave the encoding to the user of zipfile.

----------------------------------------------------------------------

Comment By: Jens Diemer (pylucid)
Date: 2006-04-06 10:39

Message:
Logged In: YES 
user_id=1330780

Hm! Which occurs to me straight:

Shouldn't Python make the conversion automaticly?

If the arcname is type unicode, Python should convert it to
cp437.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-01-18 14:54

Message:
Logged In: YES 
user_id=21627

I don't know who the info-zip walker is, but atleast Winzip
will interpret all file names as CP437. This becomes obvious
when you  try to unzip a file name with non-ASCII characters
on Windows NT: the resulting Unicode file names are
generated as-if the encoding used is code page 437. I
believe this is the same in pkware.

----------------------------------------------------------------------

Comment By: Simon Harrison (ssmmhh)
Date: 2004-01-18 11:42

Message:
Logged In: YES 
user_id=775521


I would be happy to just see an exception indicating that
the supplied filename mustn't be unicode, to save people
time figuring this one out in the future.  I can supply a
patch but I thought this was too trivial for that.

You wrote:
>Names in zip files are stored in code page 437

Correct me if I'm wrong, but won't the info-zip directory
walker just stick whatever it enumerates into the name
field?  I don't quite understand what you mean by 'no support'.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-01-17 23:56

Message:
Logged In: YES 
user_id=21627

Using UTF-8 is incorrect. Names in zip files are stored in
code page 437. There is no support for file names outside
this character set.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=878120&group_id=5470


More information about the Python-bugs-list mailing list