distinction between unzipping bytes and unzipping a file
steve at holdenweb.com
Fri Jan 9 21:15:29 CET 2009
> In python, is there a distinction between unzipping bytes and
> unzipping a binary file to which those bytes have been written?
> The following code is, I think, an example of writing bytes to a file
> and then unzipping...
> decoded = base64.b64decode(datum)
> #datum is a base64 encoded string of data downloaded from a web
> f = open('data.zip', 'wb')
> x = zipfile.ZipFile('data.zip', 'r')
> After looking at the preceding code, the provider of the web service
> gave me this advice...
> "Instead of trying to create a file, take the unzipped bytes and get a
> Unicode string of text from it."
Not terribly useful advice, but one presumes he she or it was trying to
> If so, I'm not sure how to do what he's suggesting, or if it's really
> different from what I've done.
Well, what you have done appears pretty wrong to me, but let's take a
look. What's datum? You appear to be treating it as base64-encoded data;
is that correct? Have you examined it?
f = open('data.zip', 'wb')
opens the file data.zip for writing in binary. Not as a zip file, you
understand, just as a regular file. I suspect here you really needed
f = zipfile.ZipFile('data.zip', 'w')
Now, of course, you need to remember what zipfiles contain. Which is
other files. So the data you *write* tot he zipfile has to be associated
with a filename in the archive. Of course you don't have the data in a
file, you have it in a string, so you would use
You have now written a zip file containing a single "somefile.dat" file
with the decoded base64 data in it. Open it with Winzip or one of its
buddies and see if anyone barfs.
> I find that I am able to unzip the resulting data.zip using the unix
> unzip command, but the file inside contains some FFFD characters, as
> described in this thread...
> I don't know if the unwanted characters might be the result of my
> trying to write and unzip a file, rather than unzipping the bytes.
> The file does contain a semblance of what I ultimately want -- it's
> not all garbage.
But it's certainly not a zip file.
> Apologies if it's not appropriate to start a new thread for this. It
> just seems like a different topic than how to deal with the resulting
> FFFD characters.
Don't worry about it.
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-list