Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
The discussion on PEP 441 (started in thread https://mail.python.org/pipermail/python-dev/2015-February/138277.html and on the issue tracker at http://bugs.python.org/issue23491) seems to have mostly settled down. I don't think there are any outstanding tasks, so I'm happy that the PEP is ready for pronouncement. I've included the latest copy of the PEP inline below, for reference. (Can I also add a gentle reminder that PEP 486 is ready for pronouncement? It's be nice to get these into 3.5 alpha 2, which is due at the end of next week...) Paul PEP: 441 Title: Improving Python ZIP Application Support Version: $Revision$ Last-Modified: $Date$ Author: Daniel Holth <dholth@gmail.com>, Paul Moore <p.f.moore@gmail.com> Discussions-To: https://mail.python.org/pipermail/python-dev/2015-February/138277.html Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30 March 2013 Post-History: 30 March 2013, 1 April 2013, 16 February 2015 Improving Python ZIP Application Support ======================================== Python has had the ability to execute directories or ZIP-format archives as scripts since version 2.6 [1]_. When invoked with a zip file or directory as its first argument the interpreter adds that directory to sys.path and executes the ``__main__`` module. These archives provide a great way to publish software that needs to be distributed as a single file script but is complex enough to need to be written as a collection of modules. This feature is not as popular as it should be mainly because it was not promoted as part of Python 2.6 [2]_, so that it is relatively unknown, but also because the Windows installer does not register a file extension (other than ``.py``) for this format of file, to associate with the launcher. This PEP proposes to fix these problems by re-publicising the feature, defining the ``.pyz`` and ``.pyzw`` extensions as "Python ZIP Applications" and "Windowed Python ZIP Applications", and providing some simple tooling to manage the format. A New Python ZIP Application Extension ====================================== The terminology "Python Zip Application" will be the formal term used for a zip-format archive that contains Python code in a form that can be directly executed by Python (specifically, it must have a ``__main__.py`` file in the root directory of the archive). The extension ``.pyz`` will be formally associated with such files. The Python 3.5 installer will associate ``.pyz`` and ``.pyzw`` "Python Zip Applications" with the platform launcher so they can be executed. A ``.pyz`` archive is a console application and a ``.pyzw`` archive is a windowed application, indicating whether the console should appear when running the app. On Unix, it would be ideal if the ``.pyz`` extension and the name "Python Zip Application" were registered (in the mime types database?). However, such an association is out of scope for this PEP. Python Zip applications can be prefixed with a ``#!`` line pointing to the correct Python interpreter and an optional explanation:: #!/usr/bin/env python3 # Python application packed with zipapp module (binary contents of archive) On Unix, this allows the OS to run the file with the correct interpreter, via the standard "shebang" support. On Windows, the Python launcher implements shebang support. However, it is always possible to execute a ``.pyz`` application by supplying the filename to the Python interpreter directly. As background, ZIP archives are defined with a footer containing relative offsets from the end of the file. They remain valid when concatenated to the end of any other file. This feature is completely standard and is how self-extracting ZIP archives and the bdist_wininst installer format work. Minimal Tooling: The zipapp Module ================================== This PEP also proposes including a module for working with these archives. The module will contain functions for working with Python zip application archives, and a command line interface (via ``python -m zipapp``) for their creation and manipulation. More complete tools for managing Python Zip Applications are encouraged as 3rd party applications on PyPI. Currently, pyyzer [5]_ and pex [6]_ are two tools known to exist. Module Interface ---------------- The zipapp module will provide the following functions: ``pack(target, directory, interpreter=None, main=None)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Writes an application archive called *target*, containing the contents of *directory*. If *interpreter* is specified, it will be written to the start of the archive as a shebang line and the file will be made executable (if no interpreter is specified, the shebang line will be omitted). If the directory contains no ``__main__.py`` file, the function will construct a ``__main__.py`` which calls the function specified in the *main* argument (which should be in the form ``'pkg.mod:fn'``). It is an error to specify *main* if the directory contains a ``__main__.py``, or to omit *main* when there is no ``__main__.py`` (as that will result in an archive which has no main function and so cannot be executed). ``get_interpreter(archive)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Returns the interpreter specified in the shebang line of the *archive*. If there is no shebang, the function returns ``None``. ``set_interpreter(archive, new_archive, interpreter=None)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Modifies the *archive*'s shebang line to contain the specified interpreter, and writes the updated archive to *new_archive*. If the *interpreter* is ``None``, removes the shebang line. Command Line Usage ------------------ The zipapp module can be run with the python ``-m`` flag. The command line interface is as follows:: python -m zipapp [options] directory Create an archive from the contents of the given directory. By default, an archive will be created with the same name as the source directory, with a .pyz extension. The following options can be specified: -o archive / --output archive The destination archive will have the specified name. -p interpreter / --python interpreter The given interpreter will be written to the shebang line of the archive. If this option is not given, the archive will have no shebang line. -m pkg.mod:fn / --main pkg.mod:fn The source directory must not have a __main__.py file. The archiver will write a __main__.py file into the target which calls fn from the module pkg.mod. The behaviour of the command line interface matches that of ``zipapp.pack()``. As noted, the archives are standard zip files, and so can be unpacked using any standard ZIP utility or Python's zipfile module. FAQ --- Are you sure a standard ZIP utility can handle ``#!`` at the beginning? Absolutely. The zipfile specification allows for arbitrary data to be prepended to a zipfile. This feature is commonly used by "self-extracting zip" programs. If your archive program can't handle this, it is a bug in your archive program. Isn't zipapp just a very thin wrapper over the zipfile module? Yes. If you prefer to build your own Python zip application archives using other tools, they will work just as well. The zipapp module is a convenience, nothing more. Why not use just use a ``.zip`` or ``.py`` extension? Users expect a ``.zip`` file to be opened with an archive tool, and expect a ``.py`` file to contain readable text. Both would be confusing for this use case. How does this compete with existing package formats? The sdist, bdist and wheel formats are designed for packaging of modules to be installed into an existing Python installation. They are not intended to be used without installing. The executable zip format is specifically designed for standalone use, without needing to be installed. They are in effect a multi-file version of a standalone Python script. Rejected Proposals ================== Convenience Values for Shebang Lines ------------------------------------ Is it worth having "convenience" forms for any of the common interpreter values? For example, ``-p 3`` meaning the same as ``-p "/usr/bin/env python3"``. It would save a lot of typing for the common cases, as well as giving cross-platform options for people who don't want or need to understand the intricacies of shebang handling on "other" platforms. Downsides are that it's not obvious how to translate the abbreviations. For example, should "3" mean "/usr/bin/env python3", "/usr/bin/python3", "python3", or something else? Also, there is no obvious short form for the key case of "/usr/bin/env python" (any available version of Python), which could easily result in scripts being written with overly-restrictive shebang lines. Overall, this seems like there are more problems than benefits, and as a result has been dropped from consideration. Registering ``.pyz`` as a Media Type -------------------------------- It was suggested [3]_ that the ``.pyz`` extension should be registered in the Unix database of extensions. While it makes sense to do this as an equivalent of the Windows installer registering the extension, the ``.py`` extension is not listed in the media types database [4]_. It doesn't seem reasonable to register ``.pyz`` without ``.py``, so this idea has been omitted from this PEP. An interested party could arrange for *both* ``.py`` and ``.pyz`` to be registered at a future date. Default Interpreter ------------------- The initial draft of this PEP proposed using ``/usr/bin/env python`` as the default interpreter. Unix users have problems with this behaviour, as the default for the python command on many distributions is Python 2, and it is felt that this PEP should prefer Python 3 by default. However, using a command of ``python3`` can result in unexpected behaviour for Windows users, where the default behaviour of the launcher for the command ``python`` is commonly customised by users, but the behaviour of ``python3`` may not be modified to match. As a result, the principle "in the face of ambiguity, refuse to guess" has been invoked, and archives have no shebang line unless explicitly requested. On Windows, the archives will still be run (with the default Python) by the launcher, and on Unix, the archives can be run by explicitly invoking the desired Python interpreter. Command Line Tool to Manage Shebang Lines ----------------------------------------- It is conceivable that users would want to modify the shebang line for an existing archive, or even just display the current shebang line. This is tricky to do so with existing tools (zip programs typically ignore prepended data totally, and text editors can have trouble editing files containing binary data). The zipapp module provides functions to handle the shebang line, but does not include a command line interface to that functionality. This is because it is not clear how to provide one without the resulting interface being over-complex and potentially confusing. Changing the shebang line is expected to be an uncommon requirement. Reference Implementation ======================== A reference implementation is at http://bugs.python.org/issue23491. References ========== .. [1] Allow interpreter to execute a zip file (http://bugs.python.org/issue1739468) .. [2] Feature is not documented (http://bugs.python.org/issue17359) .. [3] Discussion of adding a .pyz mime type on python-dev (https://mail.python.org/pipermail/python-dev/2015-February/138338.html) .. [4] Register of media types (http://www.iana.org/assignments/media-types/media-types.xhtml) .. [5] pyzzer - A tool for creating Python-executable archives (https://pypi.python.org/pypi/pyzzer) .. [6] pex - The PEX packaging toolchain (https://pypi.python.org/pypi/pex) The discussion of this PEP took place on the python-dev mailing list, in the thread starting at https://mail.python.org/pipermail/python-dev/2015-February/138277.html Copyright ========= This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
Overall I like this and don't see any reason not to accept it, so +1. I do have a couple comments/questions on the module API, though. On Mon Feb 23 2015 at 12:45:28 PM Paul Moore <p.f.moore@gmail.com> wrote:
<SNIP>
``set_interpreter(archive, new_archive, interpreter=None)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Modifies the *archive*'s shebang line to contain the specified interpreter, and writes the updated archive to *new_archive*. If the *interpreter* is ``None``, removes the shebang line.
Should new_archive default to None to allow for in-place editing? -Brett
On Mon, Feb 23, 2015 at 1:16 PM, Brett Cannon <brett@python.org> wrote:
Overall I like this and don't see any reason not to accept it, so +1. I do have a couple comments/questions on the module API, though.
On Mon Feb 23 2015 at 12:45:28 PM Paul Moore <p.f.moore@gmail.com> wrote:
<SNIP>
``set_interpreter(archive, new_archive, interpreter=None)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Modifies the *archive*'s shebang line to contain the specified interpreter, and writes the updated archive to *new_archive*. If the *interpreter* is ``None``, removes the shebang line.
Should new_archive default to None to allow for in-place editing?
-Brett
That would be cool but more work. Unless the length of the new shebang is <= the old one, the zip file contents have to be moved out of the way.
On Mon Feb 23 2015 at 1:34:03 PM Daniel Holth <dholth@gmail.com> wrote:
On Mon, Feb 23, 2015 at 1:16 PM, Brett Cannon <brett@python.org> wrote:
Overall I like this and don't see any reason not to accept it, so +1. I do have a couple comments/questions on the module API, though.
On Mon Feb 23 2015 at 12:45:28 PM Paul Moore <p.f.moore@gmail.com> wrote:
<SNIP>
``set_interpreter(archive, new_archive, interpreter=None)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Modifies the *archive*'s shebang line to contain the specified interpreter, and writes the updated archive to *new_archive*. If the *interpreter* is ``None``, removes the shebang line.
Should new_archive default to None to allow for in-place editing?
-Brett
That would be cool but more work. Unless the length of the new shebang is <= the old one, the zip file contents have to be moved out of the way.
Couldn't you just keep it in memory as bytes and then write directly over the file? I realize that's a bit wasteful memory-wise but it is possible. The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about this special case.
On 23 February 2015 at 18:40, Brett Cannon <brett@python.org> wrote:
Couldn't you just keep it in memory as bytes and then write directly over the file? I realize that's a bit wasteful memory-wise but it is possible. The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about this special case.
I did consider this option, and I still quite like it. In fact, originally I wrote the API to *only* be in-place, until I realised that wouldn't work for things bigger than memory (but who has a Python app that's bigger than RAM?) I'm happy to modify the API along these lines (details to be thrashed out) if people think it's worthwhile. Paul
Sounds reasonable. It could be done by just reading the entire file contents after the shebang and re-writing them with the necessary offset all in RAM, truncating the file if necessary, without involving the zipfile module very much; the shebang could have some amount of padding by default; the file could just be re-compressed in memory depending on your appetite for complexity. On Mon, Feb 23, 2015 at 1:49 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 23 February 2015 at 18:40, Brett Cannon <brett@python.org> wrote:
Couldn't you just keep it in memory as bytes and then write directly over the file? I realize that's a bit wasteful memory-wise but it is possible. The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about this special case.
I did consider this option, and I still quite like it. In fact, originally I wrote the API to *only* be in-place, until I realised that wouldn't work for things bigger than memory (but who has a Python app that's bigger than RAM?)
I'm happy to modify the API along these lines (details to be thrashed out) if people think it's worthwhile. Paul
On 02/23/2015 11:01 AM, Daniel Holth wrote:
On Mon, Feb 23, 2015 at 1:49 PM, Paul Moore wrote:
On 23 February 2015 at 18:40, Brett Cannon wrote:
Couldn't you just keep it in memory as bytes and then write directly over the file? I realize that's a bit wasteful memory-wise but it is possible. The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about this special case.
I did consider this option, and I still quite like it. In fact, originally I wrote the API to *only* be in-place, until I realised that wouldn't work for things bigger than memory (but who has a Python app that's bigger than RAM?)
I'm happy to modify the API along these lines (details to be thrashed out) if people think it's worthwhile.
Sounds reasonable. It could be done by just reading the entire file contents after the shebang and re-writing them with the necessary offset all in RAM, truncating the file if necessary, without involving the zipfile module very much; the shebang could have some amount of padding by default; the file could just be re-compressed in memory depending on your appetite for complexity.
This could be a completely stupid question, but how does the zip file know where the individual files are? More to the point, does the index work via relative or absolute offset? If absolute, wouldn't the index have to be rewritten if the zip portion of the file moves? -- ~Ethan~
On 23 February 2015 at 19:22, Ethan Furman <ethan@stoneleaf.us> wrote:
This could be a completely stupid question, but how does the zip file know where the individual files are? More to the point, does the index work via relative or absolute offset? If absolute, wouldn't the index have to be rewritten if the zip portion of the file moves?
Essentially the index is stored at the *end* of the file, and contains relative offsets to all the files. Gory details at https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Paul
On Mon, Feb 23, 2015 at 8:22 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On Mon, Feb 23, 2015 at 1:49 PM, Paul Moore wrote:
On 23 February 2015 at 18:40, Brett Cannon wrote:
Couldn't you just keep it in memory as bytes and then write directly
over
the file? I realize that's a bit wasteful memory-wise but it is
The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about
On 02/23/2015 11:01 AM, Daniel Holth wrote: possible. this
special case.
I did consider this option, and I still quite like it. In fact, originally I wrote the API to *only* be in-place, until I realised that wouldn't work for things bigger than memory (but who has a Python app that's bigger than RAM?)
I'm happy to modify the API along these lines (details to be thrashed out) if people think it's worthwhile.
Sounds reasonable. It could be done by just reading the entire file contents after the shebang and re-writing them with the necessary offset all in RAM, truncating the file if necessary, without involving the zipfile module very much; the shebang could have some amount of padding by default; the file could just be re-compressed in memory depending on your appetite for complexity.
This could be a completely stupid question, but how does the zip file know where the individual files are? More to the point, does the index work via relative or absolute offset? If absolute, wouldn't the index have to be rewritten if the zip portion of the file moves?
Yes and no. The ZIP format uses a 'central directory' which is a record of each file in the archive. The offsets are relative (although the specification is a little vague on what they're relative *to* when using a .zip file. The wording talks about disk numbers, ZIP being from the era of floppy disks.) You find the central directory by searching from the end (or reading a specific spot at the end, if you don't support archive comments. zipimport, for example, doesn't support archive comments) and it turns out you can find the central directory from just that information (and as far as I know, all tools do.) However, there are still some offsets that would change if you add stuff to the front of the ZIP file (or remove it), and some zip tools will complain (usually just in verbose mode, though.) -- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!
So is the PEP ready for pronouncement or should there be more discussion? Also, do you have a BDFL-delegate or do you want me to review it? On Mon, Feb 23, 2015 at 11:41 AM, Thomas Wouters <thomas@python.org> wrote:
On Mon, Feb 23, 2015 at 8:22 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On Mon, Feb 23, 2015 at 1:49 PM, Paul Moore wrote:
On 23 February 2015 at 18:40, Brett Cannon wrote:
Couldn't you just keep it in memory as bytes and then write directly
over
the file? I realize that's a bit wasteful memory-wise but it is
On 02/23/2015 11:01 AM, Daniel Holth wrote: possible.
The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about this special case.
I did consider this option, and I still quite like it. In fact, originally I wrote the API to *only* be in-place, until I realised that wouldn't work for things bigger than memory (but who has a Python app that's bigger than RAM?)
I'm happy to modify the API along these lines (details to be thrashed out) if people think it's worthwhile.
Sounds reasonable. It could be done by just reading the entire file contents after the shebang and re-writing them with the necessary offset all in RAM, truncating the file if necessary, without involving the zipfile module very much; the shebang could have some amount of padding by default; the file could just be re-compressed in memory depending on your appetite for complexity.
This could be a completely stupid question, but how does the zip file know where the individual files are? More to the point, does the index work via relative or absolute offset? If absolute, wouldn't the index have to be rewritten if the zip portion of the file moves?
Yes and no. The ZIP format uses a 'central directory' which is a record of each file in the archive. The offsets are relative (although the specification is a little vague on what they're relative *to* when using a .zip file. The wording talks about disk numbers, ZIP being from the era of floppy disks.) You find the central directory by searching from the end (or reading a specific spot at the end, if you don't support archive comments. zipimport, for example, doesn't support archive comments) and it turns out you can find the central directory from just that information (and as far as I know, all tools do.) However, there are still some offsets that would change if you add stuff to the front of the ZIP file (or remove it), and some zip tools will complain (usually just in verbose mode, though.)
-- Thomas Wouters <thomas@python.org>
Hi! I'm an email virus! Think twice before sending your email to help me spread!
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On 23 February 2015 at 19:47, Guido van Rossum <guido@python.org> wrote:
So is the PEP ready for pronouncement or should there be more discussion?
I think Brett's idea is worth incorporating, so let's thrash that out first.
Also, do you have a BDFL-delegate or do you want me to review it?
No-one has stepped up as BDFL-delegate, and there's no obvious candidate, so I think you're it, sorry :-) Paul
On 24 February 2015 at 06:32, Paul Moore <p.f.moore@gmail.com> wrote:
On 23 February 2015 at 19:47, Guido van Rossum <guido@python.org> wrote:
So is the PEP ready for pronouncement or should there be more discussion?
I think Brett's idea is worth incorporating, so let's thrash that out first.
Also, do you have a BDFL-delegate or do you want me to review it?
No-one has stepped up as BDFL-delegate, and there's no obvious candidate, so I think you're it, sorry :-)
If Guido isn't keen, I'd be willing to cover it as the current runpy module maintainer and the one that updated this part of the interpreter startup sequence to handle the switch to importlib in 3.3. The PEP itself doesn't actually touch any of that internal machinery though, it just makes the capability a bit more discoverable and user-friendly. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I can do it but I don't want to be reviewing and accepting a PEP that's still under discussion, and I don't have the bandwidth to follow the discussion here -- I can only read the PEP. I will start that now. On Tue, Feb 24, 2015 at 1:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 24 February 2015 at 06:32, Paul Moore <p.f.moore@gmail.com> wrote:
On 23 February 2015 at 19:47, Guido van Rossum <guido@python.org> wrote:
So is the PEP ready for pronouncement or should there be more discussion?
I think Brett's idea is worth incorporating, so let's thrash that out first.
Also, do you have a BDFL-delegate or do you want me to review it?
No-one has stepped up as BDFL-delegate, and there's no obvious candidate, so I think you're it, sorry :-)
If Guido isn't keen, I'd be willing to cover it as the current runpy module maintainer and the one that updated this part of the interpreter startup sequence to handle the switch to importlib in 3.3.
The PEP itself doesn't actually touch any of that internal machinery though, it just makes the capability a bit more discoverable and user-friendly.
Regards, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-- --Guido van Rossum (python.org/~guido)
On 24 February 2015 at 17:46, Guido van Rossum <guido@python.org> wrote:
I can do it but I don't want to be reviewing and accepting a PEP that's still under discussion, and I don't have the bandwidth to follow the discussion here -- I can only read the PEP. I will start that now.
I'm just about to push an update based on Brett's comments, then it should be done. Can you hold off for a short while? Thanks. Paul. (Sorry, always the way - final comments appear as soon as you say "it's done" :-))
On 23.02.15 21:22, Ethan Furman wrote:
This could be a completely stupid question, but how does the zip file know where the individual files are? More to the point, does the index work via relative or absolute offset? If absolute, wouldn't the index have to be rewritten if the zip portion of the file moves?
Absolute.
On 23.02.15 22:23, Serhiy Storchaka wrote:
On 23.02.15 21:22, Ethan Furman wrote:
This could be a completely stupid question, but how does the zip file know where the individual files are? More to the point, does the index work via relative or absolute offset? If absolute, wouldn't the index have to be rewritten if the zip portion of the file moves?
Absolute.
Oh, I were wrong. The specification and the are not very clear about this.
On 23 February 2015 at 19:01, Daniel Holth <dholth@gmail.com> wrote:
Sounds reasonable. It could be done by just reading the entire file contents after the shebang and re-writing them with the necessary offset all in RAM, truncating the file if necessary, without involving the zipfile module very much; the shebang could have some amount of padding by default; the file could just be re-compressed in memory depending on your appetite for complexity.
The biggest problem with that is finding the end of the prefix data. Frankly it's easier just to write a new prefix then use the zipfile module to rewrite all of the content. That's what the current code does writing to a new file. Paul
On Mon, Feb 23, 2015 at 8:24 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 23 February 2015 at 19:01, Daniel Holth <dholth@gmail.com> wrote:
Sounds reasonable. It could be done by just reading the entire file contents after the shebang and re-writing them with the necessary offset all in RAM, truncating the file if necessary, without involving the zipfile module very much; the shebang could have some amount of padding by default; the file could just be re-compressed in memory depending on your appetite for complexity.
The biggest problem with that is finding the end of the prefix data. Frankly it's easier just to write a new prefix then use the zipfile module to rewrite all of the content. That's what the current code does writing to a new file.
I don't think you need to rewrite all of the contents, if you don't mind poking into zipfile internals: endrec = zipfile._EndRecData(f) prefix_length = endrec[zipfile._ECD_LOCATION] - endrec[zipfile._ECD_SIZE] - endrec[zipfile._ECD_OFFSET] I do something similar to get at the prefix, although I need the zipfile opened anyway, so I use: endrec = zipfile._EndRecData(f) # pylint: disable=protected-access zf = zipfile.ZipFile(f) # endrec is None if reading it failed, but then ZipFile should have # raised an exception... assert endrec prefix_len = zf.start_dir - endrec[zipfile._ECD_OFFSET] # pylint: disable=protected-access Paul
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/thomas%40python.org
-- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!
On 02/23/2015 10:49 AM, Paul Moore wrote:
I did consider this option, and I still quite like it. In fact, originally I wrote the API to *only* be in-place, until I realised that wouldn't work for things bigger than memory (but who has a Python app that's bigger than RAM?)
Depends -- how many image files are that app? ;) -- ~Ethan~
On 23 February 2015 at 18:40, Brett Cannon <brett@python.org> wrote:
Couldn't you just keep it in memory as bytes and then write directly over the file? I realize that's a bit wasteful memory-wise but it is possible. The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about this special case.
The real problem with overwriting is if there's a failure during the overwrite you lose the original file. My original API had overwrite as the default, but I think the risk makes that a bad idea. One option would be to allow outputs (TARGET in pack() and NEW_ARCHIVE in set_interpreter()) to be open files (open for write in bytes mode) as well as filenames[1]. Then the caller has the choice of how to manage the output. The docs could include an example of overwriting via a BytesIO object, and point out the risk. BTW, while I was looking at the API, I realised I don't like the order of arguments in pack(). I'm tempted to make it pack(directory, target=None, interpreter=None, main=None) where a target of None means "use the name of the source directory with .pyz tacked on", exactly as for the command line API. What do you think? The change would be no more than a few minutes' work if it's acceptable. Paul [1] What's the standard practice for such dual-mode arguments? ZipFile tests if the argument is a str instance and assumes a file if not. I'd be inclined to follow that practice here.
On Mon Feb 23 2015 at 3:51:18 PM Paul Moore <p.f.moore@gmail.com> wrote:
Couldn't you just keep it in memory as bytes and then write directly over the file? I realize that's a bit wasteful memory-wise but it is possible. The docs could mention the memory cost is something to watch out for when doing an in-place replacement. Heck the code could even make it an io.BytesIO instance so the rest of the code doesn't have to care about
On 23 February 2015 at 18:40, Brett Cannon <brett@python.org> wrote: this
special case.
The real problem with overwriting is if there's a failure during the overwrite you lose the original file. My original API had overwrite as the default, but I think the risk makes that a bad idea.
Couldn't you catch the exception, write the original file back out, and then re-raise the exception?
One option would be to allow outputs (TARGET in pack() and NEW_ARCHIVE in set_interpreter()) to be open files (open for write in bytes mode) as well as filenames[1]. Then the caller has the choice of how to manage the output. The docs could include an example of overwriting via a BytesIO object, and point out the risk.
That sounds like a good idea. No reason to do the file opening on someone's behalf when opening files is so easy and keeps layering abstractions at a good level. Would this extend also to the archive being read to be consistent? I should mention I originally thought of extending this to pack() for 'main', but realized that passing in the function to set would require tools to import the code they are simply trying to pack and that was the wrong thing to do.
BTW, while I was looking at the API, I realised I don't like the order of arguments in pack(). I'm tempted to make it pack(directory, target=None, interpreter=None, main=None) where a target of None means "use the name of the source directory with .pyz tacked on", exactly as for the command line API.
What do you think? The change would be no more than a few minutes' work if it's acceptable.
+1 from me. -Brett
Paul
[1] What's the standard practice for such dual-mode arguments? ZipFile tests if the argument is a str instance and assumes a file if not. I'd be inclined to follow that practice here.
On 02/23/2015 01:02 PM, Brett Cannon wrote:
On Mon Feb 23 2015 at 3:51:18 PM Paul Moore wrote:
The real problem with overwriting is if there's a failure during the overwrite you lose the original file. My original API had overwrite as the default, but I think the risk makes that a bad idea.
Couldn't you catch the exception, write the original file back out, and then re-raise the exception?
This seems to be getting pretty complex for a nice-to-have.
One option would be to allow outputs (TARGET in pack() and NEW_ARCHIVE in set_interpreter()) to be open files (open for write in bytes mode) as well as filenames[1].
+1 for this.
BTW, while I was looking at the API, I realised I don't like the order of arguments in pack(). I'm tempted to make it pack(directory, target=None, interpreter=None, main=None) where a target of None means "use the name of the source directory with .pyz tacked on", exactly as for the command line API.
What do you think? The change would be no more than a few minutes' work if it's acceptable.
+1 from me.
+1 from me as well. -- ~Ethan~
On 23 February 2015 at 21:02, Brett Cannon <brett@python.org> wrote:
The real problem with overwriting is if there's a failure during the overwrite you lose the original file. My original API had overwrite as the default, but I think the risk makes that a bad idea.
Couldn't you catch the exception, write the original file back out, and then re-raise the exception?
But you don't *have* the original file. You read the source archive entry-by-entry, not all at once. Apart from the implementation difficulty, this is getting too complex, and I think it's better to just give the user the tools to add whatever robustness or exception handling they want on top. Paul
On 23.02.15 22:51, Paul Moore wrote:
BTW, while I was looking at the API, I realised I don't like the order of arguments in pack(). I'm tempted to make it pack(directory, target=None, interpreter=None, main=None) where a target of None means "use the name of the source directory with .pyz tacked on", exactly as for the command line API.
If the order of arguments is not obvious, make them keyword-only.
On 23 February 2015 at 21:18, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 23.02.15 22:51, Paul Moore wrote:
BTW, while I was looking at the API, I realised I don't like the order of arguments in pack(). I'm tempted to make it pack(directory, target=None, interpreter=None, main=None) where a target of None means "use the name of the source directory with .pyz tacked on", exactly as for the command line API.
If the order of arguments is not obvious, make them keyword-only.
To be honest, I don't think it *is* that non-obvious. I just think I got it wrong initially. With the new API, you have pack('myappdir') pack('myappdir', 'named_target.pyz') Seems obvious enough to me. It matches the "source, destination" order of similar functions like shutil.copy. And you can use a named argument if you're not sure. But I don't think it's worth forcing it. Paul
participants (8)
-
Brett Cannon
-
Daniel Holth
-
Ethan Furman
-
Guido van Rossum
-
Nick Coghlan
-
Paul Moore
-
Serhiy Storchaka
-
Thomas Wouters