[issue41928] ZipFile does not supports Unicode Path Extra Field (0x7075) zip header field

Ivan Sorokin report at bugs.python.org
Sun Oct 4 11:24:54 EDT 2020


Ivan Sorokin <ivan.sorokin.tech at gmail.com> added the comment:

Grand unified algorithm to read filenames from zip files correctly:

1. Do zip entry have «Unicode Path Extra Field» (0x7075)? Use it for file name.
2. Is Unicode flag (0x800) set in «Flags» Field of zip entry? Assume «Filename» Field is in UTF-8.
3. Do «HostOS» Field of zip entry have values of 0 (FAT) or 11 (NTFS)? Assume «Filename» Field is in OEM charset corresponding to system locale.
4. Assume «Filename» Field is in UTF-8.

p7zip with oemcp patch (https://github.com/unxed/oemcp/) uses exactly this method, and is able to process all zip files in my test set correctly (my test set contains several zips generated by different packers on windows, macos, linux, and by online services). The same algorithm should be used in any zip unpacker wishing to process non-latin filenames as gently as possible.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41928>
_______________________________________


More information about the Python-bugs-list mailing list