[Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support

Thomas Wouters thomas at python.org
Mon Feb 23 20:41:35 CET 2015


On Mon, Feb 23, 2015 at 8:22 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> On 02/23/2015 11:01 AM, Daniel Holth wrote:
> > On Mon, Feb 23, 2015 at 1:49 PM, Paul Moore wrote:
> >> On 23 February 2015 at 18:40, Brett Cannon wrote:
> >>>
> >>> Couldn't you just keep it in memory as bytes and then write directly
> over
> >>> the file? I realize that's a bit wasteful memory-wise but it is
> possible.
> >>> The docs could mention the memory cost is something to watch out for
> when
> >>> doing an in-place replacement. Heck the code could even make it an
> >>> io.BytesIO instance so the rest of the code doesn't have to care about
> this
> >>> special case.
> >>
> >> I did consider this option, and I still quite like it. In fact,
> >> originally I wrote the API to *only* be in-place, until I realised
> >> that wouldn't work for things bigger than memory (but who has a Python
> >> app that's bigger than RAM?)
> >>
> >> I'm happy to modify the API along these lines (details to be thrashed
> >> out) if people think it's worthwhile.
> >
> > Sounds reasonable. It could be done by just reading the entire file
> > contents after the shebang and re-writing them with the necessary
> > offset all in RAM, truncating the file if necessary, without involving
> > the zipfile module very much; the shebang could have some amount of
> > padding by default; the file could just be re-compressed in memory
> > depending on your appetite for complexity.
>
> This could be a completely stupid question, but how does the zip file know
> where the individual files are?  More to the
> point, does the index work via relative or absolute offset?  If absolute,
> wouldn't the index have to be rewritten if the
> zip portion of the file moves?
>

Yes and no. The ZIP format uses a 'central directory' which is a record of
each file in the archive. The offsets are relative (although the
specification is a little vague on what they're relative *to* when using a
.zip file. The wording talks about disk numbers, ZIP being from the era of
floppy disks.) You find the central directory by searching from the end (or
reading a specific spot at the end, if you don't support archive comments.
zipimport, for example, doesn't support archive comments) and it turns out
you can find the central directory from just that information (and as far
as I know, all tools do.) However, there are still some offsets that would
change if you add stuff to the front of the ZIP file (or remove it), and
some zip tools will complain (usually just in verbose mode, though.)

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm an email virus! Think twice before sending your email to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150223/d7efb704/attachment.html>


More information about the Python-Dev mailing list