distinction between unzipping bytes and unzipping a file

Steve Holden steve at holdenweb.com
Fri Jan 9 21:15:29 CET 2009

webcomm wrote:
> Hi,
> In python, is there a distinction between unzipping bytes and
> unzipping a binary file to which those bytes have been written?
> The following code is, I think, an example of writing bytes to a file
> and then unzipping...
> decoded = base64.b64decode(datum)
> #datum is a base64 encoded string of data downloaded from a web
> service
> f = open('data.zip', 'wb')
> f.write(decoded)
> f.close()
> x = zipfile.ZipFile('data.zip', 'r')
> After looking at the preceding code, the provider of the web service
> gave me this advice...
> "Instead of trying to create a file, take the unzipped bytes and get a
> Unicode string of text from it."
Not terribly useful advice, but one presumes he she or it was trying to
be helpful.

> If so, I'm not sure how to do what he's suggesting, or if it's really
> different from what I've done.
Well, what you have done appears pretty wrong to me, but let's take a
look. What's datum? You appear to be treating it as base64-encoded data;
is that correct? Have you examined it?

f = open('data.zip', 'wb')

opens the file data.zip for writing in binary. Not as a zip file, you
understand, just as a regular file. I suspect here you really needed

f = zipfile.ZipFile('data.zip', 'w')

Now, of course, you need to remember what zipfiles contain. Which is
other files. So the data you *write* tot he zipfile has to be associated
with a filename in the archive. Of course you don't have the data in a
file, you have it in a string, so you would use

f.writestr("somefile.dat", decoded)

You have now written a zip file containing a single "somefile.dat" file
with the decoded base64 data in it. Open it with Winzip or one of its
buddies and see if anyone barfs.

> I find that I am able to unzip the resulting data.zip using the unix
> unzip command, but the file inside contains some FFFD characters, as
> described in this thread...
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/4f57abea978cc0bf?hl=en#
> I don't know if the unwanted characters might be the result of my
> trying to write and unzip a file, rather than unzipping the bytes.
> The file does contain a semblance of what I ultimately want -- it's
> not all garbage.
But it's certainly not a zip file.

> Apologies if it's not appropriate to start a new thread for this.  It
> just seems like a different topic than how to deal with the resulting
> FFFD characters.
Don't worry about it.

Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/

More information about the Python-list mailing list