BadZipfile "file is not a zip file"

John Machin sjmachin at lexicon.net
Fri Jan 9 05:42:32 EST 2009


On Jan 9, 7:46 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Jan 9, 2:16 am, Steven D'Aprano <st... at REMOVE-THIS-
>
>
>
>
>
> cybersource.com.au> wrote:
> > On Thu, 08 Jan 2009 16:47:39 -0800, webcomm wrote:
> > > The error...
> > ...
> > > BadZipfile: File is not a zip file
>
> > > When I look at data.zip in Windows, it appears to be a valid zip file.
> > > I am able to uncompress it in Windows XP, and can also uncompress it
> > > with 7-Zip.  It looks like zipfile is not able to read a "table of
> > > contents" in the zip file.  That's not a concept I'm familiar with.
>
> > No, ZipFile can read table of contents:
>
> >     Help on method printdir in module zipfile:
>
> >     printdir(self) unbound zipfile.ZipFile method
> >         Print a table of contents for the zip file.
>
> > In my experience, zip files originating from Windows sometimes have
> > garbage at the end of the file. WinZip just ignores the garbage, but
> > other tools sometimes don't -- if I recall correctly, Linux unzip
> > successfully unzips the file but then complains that the file was
> > corrupt. It's possible that you're running into a similar problem.
>
> The zipfile format is kind of brain dead, you can't tell where the end
> of the file is supposed to be by looking at the header.  If the end of
> file hasn't yet been reached there could be more data.  To make
> matters worse, somehow zip files came to have text comments simply
> appended to the end of them.  (Probably this was for the benefit of
> people who would cat them to the terminal.)
>
> Anyway, if you see something that doesn't adhere to the zipfile
> format, you don't have any foolproof way to know if it's because the
> file is corrupted or if it's just an appended comment.
>
> Most zipfile readers use a heuristic to distinguish.  Python's zipfile
> module just assumes it's corrupted.
>
> The following post from a while back gives a solution that tries to
> snip the comment off so that zipfile module can handle it.  It might
> help you out.
>
> http://groups.google.com/group/comp.lang.python/msg/c2008e48368c6543

And here's a little gadget that might help the diagnostic effort; it
shows the archive size and the position of all the "magic" PKnn
markers. In a "normal" uncommented archive, EndArchive_pos + 22 ==
archive_size.
8<---
# usage: python zip_susser.py name_of_archive.zip
import sys
grimoire = [
    ("FileHeader",          "PK\003\004"), # magic number for file
header
    ("CentralDir",          "PK\001\002"), # magic number for central
directory
    ("EndArchive",          "PK\005\006"), # magic number for end of
archive record
    ("EndArchive64",        "PK\x06\x06"), # magic token for Zip64
header
    ("EndArchive64Locator", "PK\x06\x07"), # magic token for locator
header
    ]
f = open(sys.argv[1], 'rb')
buff = f.read()
f.close()
blen = len(buff)
print "archive size is", blen
for magic_name, magic in grimoire:
    pos = 0
    while pos < blen:
        pos = buff.find(magic, pos)
        if pos < 0:
            break
        print "%s at %d" % (magic_name, pos)
        pos += 4
8<---

HTH,
John



More information about the Python-list mailing list