BadZipfile "file is not a zip file"
John Machin
sjmachin at lexicon.net
Fri Jan 9 05:42:32 EST 2009
On Jan 9, 7:46 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Jan 9, 2:16 am, Steven D'Aprano <st... at REMOVE-THIS-
>
>
>
>
>
> cybersource.com.au> wrote:
> > On Thu, 08 Jan 2009 16:47:39 -0800, webcomm wrote:
> > > The error...
> > ...
> > > BadZipfile: File is not a zip file
>
> > > When I look at data.zip in Windows, it appears to be a valid zip file.
> > > I am able to uncompress it in Windows XP, and can also uncompress it
> > > with 7-Zip. It looks like zipfile is not able to read a "table of
> > > contents" in the zip file. That's not a concept I'm familiar with.
>
> > No, ZipFile can read table of contents:
>
> > Help on method printdir in module zipfile:
>
> > printdir(self) unbound zipfile.ZipFile method
> > Print a table of contents for the zip file.
>
> > In my experience, zip files originating from Windows sometimes have
> > garbage at the end of the file. WinZip just ignores the garbage, but
> > other tools sometimes don't -- if I recall correctly, Linux unzip
> > successfully unzips the file but then complains that the file was
> > corrupt. It's possible that you're running into a similar problem.
>
> The zipfile format is kind of brain dead, you can't tell where the end
> of the file is supposed to be by looking at the header. If the end of
> file hasn't yet been reached there could be more data. To make
> matters worse, somehow zip files came to have text comments simply
> appended to the end of them. (Probably this was for the benefit of
> people who would cat them to the terminal.)
>
> Anyway, if you see something that doesn't adhere to the zipfile
> format, you don't have any foolproof way to know if it's because the
> file is corrupted or if it's just an appended comment.
>
> Most zipfile readers use a heuristic to distinguish. Python's zipfile
> module just assumes it's corrupted.
>
> The following post from a while back gives a solution that tries to
> snip the comment off so that zipfile module can handle it. It might
> help you out.
>
> http://groups.google.com/group/comp.lang.python/msg/c2008e48368c6543
And here's a little gadget that might help the diagnostic effort; it
shows the archive size and the position of all the "magic" PKnn
markers. In a "normal" uncommented archive, EndArchive_pos + 22 ==
archive_size.
8<---
# usage: python zip_susser.py name_of_archive.zip
import sys
grimoire = [
("FileHeader", "PK\003\004"), # magic number for file
header
("CentralDir", "PK\001\002"), # magic number for central
directory
("EndArchive", "PK\005\006"), # magic number for end of
archive record
("EndArchive64", "PK\x06\x06"), # magic token for Zip64
header
("EndArchive64Locator", "PK\x06\x07"), # magic token for locator
header
]
f = open(sys.argv[1], 'rb')
buff = f.read()
f.close()
blen = len(buff)
print "archive size is", blen
for magic_name, magic in grimoire:
pos = 0
while pos < blen:
pos = buff.find(magic, pos)
if pos < 0:
break
print "%s at %d" % (magic_name, pos)
pos += 4
8<---
HTH,
John
More information about the Python-list
mailing list