[Python-Dev] zipfile and unicode filenames

"Martin v. Löwis" martin at v.loewis.de
Sun Jun 10 22:47:54 CEST 2007


> But this is only on Windows! I have no clue what's the common
> situation on other OSes and don't even know how to sanely get OEM
> codepage on Windows (the obvious way with ctypes.kernel32.GetOEMCP()
> doesn't seem good to me).
> 
> So I guess that's bad idea anyway, maybe conforming to language bit is
> better (ascii will stay ascii anyway).
> 
> What about this?

I haven't checked (*) whether you got the right value for flag_bits;
assuming you do, this looks good.

For compatibility, I would propose to use UTF-8 only if the file
name is not ASCII. Even though the OEM code pages vary, they
are (mostly) ASCII supersets. So if the string can be encoded
in ASCII, there is no need to set the UTF-8 flag bit.

OTOH, I now wonder whether it would *hurt* to have the flag bit:
if old zip software does not choke if the flag is set, then
it can just as well be set, as ASCII strings automatically
get encoded as ASCII in UTF-8.

Regards,
Martin

(*) I just now read

http://www.pkware.com/documents/casestudies/APPNOTE.TXT

and 0x800 seems to be the right value indeed. Notice, in
appendix D, that the specification says that the historical
encoding of file names is code page 437.


More information about the Python-Dev mailing list