[New-bugs-announce] [issue10694] zipfile.py end of central directory detection not robust

Kevin Hendricks report at bugs.python.org
Mon Dec 13 19:57:51 CET 2010

New submission from Kevin Hendricks <kevin.hendricks at sympatico.ca>:

The current version of zipfile.py is not robust to slight errors at the end of zip archives.  Many file servers **improperly** append a new line to the end of files that do not have a new line when they are uploaded from a browser.  This bug ends up adding 0x0d 0xa to the end of the zip archive.  This in turn makes zipfile.py eventually throw a "Not a zip file" exception when no other zip tools seem to have trouble with them.  Even unzip -t passes these "problem" zip archives with flying colours.

I hate to have to extract and create my own zipfile.py script just to be robust to zip archives that are commonly found on the net and that are handled more robustly by other software.

So please consider changing this code from _EndRecData below to simply ignore any trailing data after the proper stringEndArchive and structEndArchive are found instead of looking for the comment and verifying if the comment is properly formatted and throwing an exception if not correct.  Ignoring the "comment" seems to be more robust in this case as everything needed to unpack the zip archive has been found.

    # Either this is not a ZIP file, or it is a ZIP file with an archive
    # comment.  Search the end of the file for the "end of central directory"
    # record signature. The comment is the last item in the ZIP file and may be
    # up to 64K long.  It is assumed that the "end of central directory" magic
    # number does not appear in the comment.
    maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0)
    fpin.seek(maxCommentStart, 0)
    data = fpin.read()
    start = data.rfind(stringEndArchive)
    if start >= 0:
        # found the magic number; attempt to unpack and interpret
        recData = data[start:start+sizeEndCentDir]
        endrec = list(struct.unpack(structEndArchive, recData))
        comment = data[start+sizeEndCentDir:]
        # check that comment length is correct
        if endrec[_ECD_COMMENT_SIZE] == len(comment):
            # Append the archive comment and start offset
            endrec.append(maxCommentStart + start)
            if endrec[_ECD_OFFSET] == 0xffffffff:
                # There is apparently a "Zip64 end of central directory"
                # structure present, so go look for it
                return _EndRecData64(fpin, start - filesize, endrec)
            return endrec

This will in turn make the Python implementation of zipfile.py more robust to data improperly appended when some zip archives are uploaded or downloaded (similar to how other zip tools handle this issue).

Thank you for your time and consideration.

messages: 123891
nosy: KevinH
priority: normal
severity: normal
status: open
title: zipfile.py end of central directory detection not robust
type: behavior
versions: Python 2.7

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list