Unicode and Zipfile problems
gerson.kurz at t-online.de
Wed Nov 5 12:02:51 CET 2003
AAAAAAAARG I hate the way python handles unicode. Here is a nice
problem for y'all to enjoy: say you have a variable thats unicode
directory = u"c:\temp"
Its unicode not because you want it to, but because its for example
read from _winreg which returns unicode.
You do an os.listdir(directory). Note that all filenames returned are
now unicode. (Change introduced I believe in 2.3).
You add the filenames to a zipfile.ZipFile object. Sometimes, you will
get this exception:
Traceback (most recent call last):
File "collect_trace_info.py", line 65, in CollectTraceInfo
File "C:\Python23\lib\zipfile.py", line 416, in write
File "C:\Python23\lib\zipfile.py", line 170, in FileHeader
return header + self.filename + self.extra
UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position
ordinal not in range(128)
After you have regained your composure, you find the reason: "header"
is a struct.pack() generated byte string. self.filename is however a
unicode string because it is returned by os.listdir as unicode. If
"header" generates anything above 0x7F - which can but need not
happen, depending on the type of file you have an exception waiting
for yourself - sometimes. Great. (The same will probably occur if
filename contains chars > 0x7F). The problem does not occur if you
have "str" type filenames, because then no backandforth conversion is
There is a simple fix, before calling z.write() byte-encode it. Here
is a sample code:
import os, zipfile, win32api
for filename in os.listdir(directory):
if __name__ == "__main__":
Note: It might work on your system, depending on the types of files.
To fix it, use
But to my thinking, this is a bug in zipfile.py, really.
Now, could anybody please just write a
"i-don't-care-if-my-app-can-display-klingon-characters" raw byte
encoding which doesn't throw any assertions and doesn't care whether
or not the characters are in the 0x7F range? Its ok if I cannot port
my batchscripts to swaheli, really.
More information about the Python-list