BadZipfile "file is not a zip file"

John Machin sjmachin at lexicon.net
Fri Jan 9 17:21:55 EST 2009


On Jan 10, 2:22 am, webcomm <rya... at gmail.com> wrote:
> On Jan 9, 5:42 am, John Machin <sjmac... at lexicon.net> wrote:
>
> > And here's a little gadget that might help the diagnostic effort; it
> > shows the archive size and the position of all the "magic" PKnn
> > markers. In a "normal" uncommented archive, EndArchive_pos + 22 ==
> > archive_size.
>
> I ran the diagnostic gadget...
>
> archive size is 69888
> FileHeader at 0
> CentralDir at 43796
> EndArchive at 43846

Thanks. Would you mind spending a few minutes more on this so that we
can see if it's a problem that can be fixed easily, like the one that
Chris Mellon reported?

The above output says that there are 43868 (43846 + 22) bytes of
useable data. That leaves 69888 - 43868 = 26020 bytes of "comment" ...
rather large for a comment. Have you run a virus scanner over this
file?

At the end is an updated version of the diagnostic gadget. It explores
the "EndArchive" structure and the comment at the end, with a special
check for all '\0' (as per Chris's bug report) and another for all
blank. Please run it over your file and show us the results. Note: you
may want to suppress the display of the first 100 bytes of comment if
it turns out to be private data.

Cheers,
John

8<---
# zip_susser_v2.py
import sys
grimoire = [
    ("FileHeader",          "PK\003\004"), # magic number for file
header
    ("DataDescriptor",      "PK\x07\x08"), # see PKZIP APPNOTE (V) (C)
    ("CentralDir",          "PK\001\002"), # magic number for central
directory
    ("EndArchive",          "PK\005\006"), # magic number for end of
archive record
    ("EndArchive64",        "PK\x06\x06"), # magic token for Zip64
header
    ("EndArchive64Locator", "PK\x06\x07"), # magic token for locator
header
    ("ArchiveExtraData",    "PK\x06\x08"), # APPNOTE (V) (E)
    ("DigitalSignature",    "PK\x05\x05"), # APPNOTE (V) (F)
    ]
f = open(sys.argv[1], 'rb')
buff = f.read()
f.close()
blen = len(buff)
print "archive size is", blen
for magic_name, magic in grimoire:
    pos = 0
    while pos < blen:
        pos = buff.find(magic, pos)
        if pos < 0:
            break
        print "%s at %d" % (magic_name, pos)
        pos += 4
#
# find what is in the EndArchive struct
#
structEndArchive = "<4s4H2LH"     # 9 [sic] items, end of archive, 22
bytes
import struct
posEndArchive = buff.find("PK\005\006")
print "using posEndArchive =", posEndArchive
assert 0 < posEndArchive < blen
endArchive = struct.unpack(structEndArchive, buff
[posEndArchive:posEndArchive+22])
print "endArchive:", repr(endArchive)
endArchiveFieldNames = """
    signature
    this_disk_num
    central_dir_disk_num
    central_dir_this_disk_num_entries
    central_dir_overall_num_entries
    central_dir_size
    central_dir_offset
    comment_size
    """.split()
for name, value in zip(endArchiveFieldNames, endArchive):
    print "%33s : %r" % (name, value)
#
# inspect the comment
#
actual_comment_size = blen - 22 - posEndArchive
expected_comment_size = endArchive[7]
comment = buff[posEndArchive + 22:]
print
print "expected_comment_size:", expected_comment_size
print "actual_comment_size:", actual_comment_size
print "comment is all spaces:", comment == ' ' * actual_comment_size
print "comment is all '\\0':", comment == '\0' * actual_comment_size
print "comment (first 100 bytes):", repr(comment[:100])
8<---



More information about the Python-list mailing list