BadZipfile "file is not a zip file"

MRAB google at mrabarnett.plus.com
Fri Jan 9 18:52:02 EST 2009


MRAB wrote:
> webcomm wrote:
>> On Jan 8, 8:39 pm, "James Mills" <prolo... at shortcircuit.net.au> wrote:
>>> Send us a sample of this file in question...
>>
>> Here's a sample with some dummy data from the web service:
>> http://webcomm.webfactional.com/htdocs/data.zip
>>
>> That's the zip created in this line of my code...
>> f = open('data.zip', 'wb')
>>
>> If I open the file it contains as unicode in my text editor (EditPlus)
>> on Windows XP, there is ostensibly nothing wrong with it.  It looks
>> like valid XML.  But if I return it to my browser with python+django,
>> there are bad characters every other character
>>
>> If I unzip it like this...
>> popen("unzip data.zip")
>> ...then the bad characters are 'FFFD' characters as described and
>> pictured here...
>> http://groups.google.com/group/comp.lang.python/browse_thread/thread/...
>>
>> If I unzip it like this...
>> getzip('data.zip', ignoreable=30000)
>> ...using Scott's function at...
>> http://groups.google.com/group/comp.lang.python/msg/c2008e48368c6543
>> ...then the bad characters are \x00 characters.
>>
> I can unzip it in Windows XP. The file within it (called "data") is XML 
> encoded as UTF-16LE (2 bytes per character, low byte first), but without 
> the initial byte order mark. Python's zipfile module says "BadZipfile: 
> File is not a zip file".
> 
If I strip off all but the last 4 zero-bytes then the zipfile module can 
open it:

decoded = base64.b64decode(datum)
five_zeros = chr(0) * 5
while decoded.endswith(five_zeros):
     decoded = decoded[ : -1]
f = open('data.zip', 'wb')
f.write(decoded)
f.close()
x = zipfile.ZipFile('data.zip', 'r')




More information about the Python-list mailing list