
Note: draft simplified Abstract ====== This extracts aims at proposing enhancements to the generated zipapp executable Rationale ======= One area where there remains some difficulty in Python is packaging for end-user consumption. To that effect either the code is distributed in pure Python form with installers [1] or native executables are built for each target Os [2]. Currently by default, Python does not provide such utilities. This pro- posal aims at finalising a Python-specific archive as the default VM exec- utable built on zipapp. In simple terms, it proposes to enhance zipapp from plain archive to app-level archive. Advantages of archives ================== Archives provide a great way to publish software that needs to be distributed as a single file script but is complex enough to need to be written as a collection of modules [3] You can use archives for tasks such as lossless data compression, archiving, decompression, and archive unpacking. [4] Adding capabilities like digital signing is used to verify integrity and authenticity. Zip archives as apps ================ If we are to treat zip archives as app, here are some recommended features - [x] Main entry point A main entry point specifies which file to launch. Zipapp already solves this problem by either having a __main__.py [5] or specifying the entry point at the commandline ENTRYPOINT_MODULE:ENTRYPOINT_FUNCTION [6] - [ ] App info file An info file can have info such as author name, archiving date, company name etc. - [ ] Signing mechanism Mechanisms can be added to detect the integrity of the app. App hash can be used to check if the app has been modified and per-file hash can be used to detect what part has been modified. This can be further enhanced if needed. - [ ] Protecting meta data Metadata are not protected by basic signing. There existing ways to protect metadata and beyond [7] - [x] Pure-Python 3rd party package bundling In Python, as long as the 3rd party packages are pure-python packages, we can bundle and use them [6]. The user can maybe just include a requirements.txt - [ ] C-based 3rd party packages Zipapp by default was not meant to include packages at all. << The executable zip format is specifically designed for standalone use, without needing to be installed. They are in effect a multi-file version of a standalone Python script >> Though the previous point shows that this can be done. Now remains the issue of C-based packages. Distributing wheels might be the answer [8]. A zip archive is supposed to be standalone. A possible solution might be to include wheels and the wheels are installed in a site-packages folder. When running such an app, the interpreter will check first if the app-specific site-packages folder is empty, if not, install the wheels. This provides package- freezing ability. The only downside is longer first-run time. Only specifying packages to be installed is not an option as if you really want stand-alone apps, using the internet etc defeats the purpose. FAQ ==== Why not a package manager? --------------------------------------- The zipapp pep was introduced for a reason, for easing the running of archives. Maybe the package manager idea came from listening to people talking about packaging and pex and comparing it with package-managers like homebrew and concluded that pex and hence zipapp is not worth it and people would better off not complicate their lives with some zip utility. This proposal is not solving any problem at all -------------------------------------------------------------- This proposal aims at enhancing zipapp. Zipapp solved the problem. Zipapp had an aim. This proposal aims at helping zipapp better accompplish it's aim. References [1] https://pynsist.readthedocs.io/en/latest/ [2] https://www.pyinstaller.org [3] https://www.python.org/dev/peps/pep-0441/ [4] https://docs.oracle.com/javase/tutorial/deployment/jar/basicsindex.html [5] https://docs.python.org/3/library/zipapp.html [6] https://gist.github.com/lukassup/cf289fdd39124d5394513a169206631c [7] https://source.android.com/security/apksigning [8] https://pythonwheels.com Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius

On 6 Jan 2020, at 19:34, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
Note: draft simplified
Please cover the pro's and con's of the alernatives that have been raised as comments on this idea, as is usually done for a PEP style document. Also beware that zip file format does not include the encoding of the files that are in the zip file. This means that for practical purposes only ASCII filenames are portable across systems. Is this limitation a problem for this proposal? Barry

On Tue, 7 Jan 2020, 01:57 Barry Scott, <barry@barrys-emacs.org> wrote:
Thanks, i don't have much experience writing peps and if i don't bug you may i ask what "alternatives" refer to? Also beware that zip file format does not include the encoding of the files
that are in the zip file.
For the encoding of the contents, well since we are packaging python code files, it's handling will be the same as handling outside the zip file. It's handling is the same as how zipapp handles things. This means that for practical purposes only ASCII filenames are portable
across systems. Is this limitation a problem for this proposal?
If we are talking about filenames, then i guess ascii filenames are the way to go as you'd unnecessarily break things otherwise.

I’m a bit unclear on how far this goes: is it just a bit more specific with more meta-data standards? Or are you aiming for something that will run without a Python install? Other issues: Are you aiming for a bundle that can run on multiple platforms? If so, then it’ll have to have a way to bundle multiple compiled extensions and select the right ones at runtime. If this Is essentially just zipapp with the ability to bundle dependencies, then you could probably just do some sys.path hackery. In any case, thus seems like something you could implement, and then see if people find it useful. BTW- I’m pretty sure we could simply specify that filenames are utf-8 and we’d be good to go. -CHB On Mon, Jan 6, 2020 at 5:50 PM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Tue, Jan 7, 2020 at 6:40 AM Christopher Barker <pythonchb@gmail.com> wrote:
I’m a bit unclear on how far this goes: is it just a bit more specific with more meta-data standards?
- More metadata - Integrity check with hashing - Protecting the meta data - Bundling 3rd party packages Or are you aiming for something that will run without a Python install?
Aie aie Mr. Christopher, zipapp requires a Python install Other issues:
According to the discussion on the Python, Be Bold thread, it became clear that it will be a pain to generate and will have an unnecessary size but sure this a most stable idea Suggesting instead to include wheels. The wheels are installed. The interpreter looks for packages in that app-specific folder If this Is essentially just zipapp with the ability to bundle dependencies,
then you could probably just do some sys.path hackery.
Could you please explain more. Thanks? In any case, thus seems like something you could implement, and then see if
people find it useful.
That's a nice idea to have a working demo. I'm not a security expert but i'll try! Anyone interested in this thread can view this tool <https://github.com/linkedin/shiv> built by LinkedIn which attempts dependencies bundling.

On Mon, Jan 6, 2020 at 10:50 PM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
- More metadata
good idea, and simple.
- Integrity check with hashing - Protecting the meta data
This could be a big challenge -- and I'm not expert, so have no idea what the issues are.
- Bundling 3rd party packages
Well, as you state below, that could make it big. but it also could make it useful -- folks want to use environments of various sorts to keep dependencies separate, so bundling them all up in an app would be nice. But a thought on that -- you may be able to accomplish something similar with conda, "conda constructor", and "conda run". -- or a new tool built from those. The idea is that the first time you ran your "app", it would install its dependencies, and then use them in an isolated environment. But if the multiple apps had the same dependencies, they would share them, so you wouldn't get major bloat on the host machine.
but a wheel is just as big as the installed package (at least a zipped version) -- it's essentially the package compressed into a tarball. If this Is essentially just zipapp with the ability to bundle dependencies,
then you could probably just do some sys.path hackery.
Could you please explain more. Thanks?
sure -- in your zip file, you have a "dependencies" directory. the dependencies get installed there. Then that dir gets added to sys.path at startup. I'm not so sure o=how to do that inside a zipfile, but it could be done *somehow* In any case, thus seems like something you could implement, and then see if
well, you'll need a consult on the security issues -- which you would want well reviewed anyway ;-)
There you go -- you've got half the job done already :-) But: "Unlike “conventional” zipapps, shiv packs a site-packages style directory of your tool’s dependencies into the resulting binary, and then at bootstrap time extracts it into a ~/.shiv cache directory." which is how they get around the "how to add a dir in a zip file to sys.path" -- but I'll bet someone could hack that to no be neccesary.... -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, 8 Jan 2020, 11:09 Christopher Barker, <pythonchb@gmail.com> wrote:
I guess it's time to dig more into anaconda, been putting it off, will do. but a wheel is just as big as the installed package (at least a zipped
version) -- it's essentially the package compressed into a tarball.
I really hope C extentions would become redundent someday in Python, which would make Python development real Python dev. The proposal at hand is maybe the best solution to a hard nut case that most if not all solutions preferred to avoid But: "Unlike “conventional” zipapps, shiv packs a site-packages style
directory of your tool’s dependencies into the resulting binary, and then at bootstrap time extracts it into a ~/.shiv cache directory."
Maybe we can have a PYZ directory where the packages for each app are extracted then it's not a global dump but more specific Why not that route? It would be nice to comment on what is wrong with Shiv's mode of execution

On Wed, Jan 8, 2020 at 1:24 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
to be clear -- you want to look at "conda", not "Anaconda" -- conda is a package manager, Anaconda is a distribution created with the conda package manager.
That's not going to completely happen. Which does not mean that a solution that doesn't support them isn't still useful for a lot. But it would be interesting to see how many commonly used packages on PyPi rely on C extensions (other than the SciPy Stack).
I'm not sure how that differs from a .shiv directory, which is not global. But a way to share packages in the "central place for packages" would be nice. -- maybe how conda does it with hard links? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Wed, Jan 8, 2020 at 8:08 PM Christopher Barker <pythonchb@gmail.com> wrote:
Just a quick note on that. A global directory has the side effect mentionned in Shiv's readme:
If you create many utilities with shiv, you may want to occasionally clean this directory.
As with many packages mixed in, you can have unwanted side-effects. Being more specific might be Shiv/ app1/ package1 app2/

Have a look at this write up about the horror that is zip file name handling. https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ This has been a pain point at work. Barry

Barry writes:
Have a look at this write up about the horror that is zip file name handling.
As I implied, I don't need to "read write-ups", *I live the horror.* Not daily, but always when I really don't want to spend the minutes.
This has been a pain point at work.
I know your pain. But this PEP is not about your work environment (does it include Japanese bureaucrats? ;-/), it's about a file format that we control. (More specifically, Mr. Janhangeer does.) The question (in any normal case) is simply how does the system default file name encoding interact with the PEP, and my guess is "if we take a bit of care, it doesn't." Steve

On Wed, Jan 8, 2020 at 1:49 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
I'm pretty sure this is a non-issue for this use-case. If you need to open sip files created by arbitrary other systems, or create zip files that can be opened by arbitrary other systems, then it's a big mess. But that isn't the case here. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

You are offing up a competitor against python wheels, Py2app, py2exe etc packagers. Explain the benefits and weaknesses compared to the existing alternatives. You might want to look at pex that is mentioned in the pep you refer to. The other mentioned app has seen no update sine 2013.
I replies seperaly about this problem.
Barry

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Wed, Jan 8, 2020 at 2:20 AM Barry <barry@barrys-emacs.org> wrote:
You are offing up a competitor against python wheels
This proposal proposes to inlcude python wheels in the archive
Py2app, py2exe etc packagers.
Native executables are off the plate. This one deals with archive files. But i get the idea, thanks! Maybe you wanted to allude to projects like Shiv <https://github.com/linkedin/shiv/> by LinkedIn
Explain the benefits and weaknesses compared to the existing alternatives.
There are some projects similar to Shiv, will write a comparison.

Barry Scott writes:
Also beware that zip file format does not include the encoding of the files that are in the zip file.
The most recent zipfile format, which is now a decade or so old, does specify the encoding, for values of 0 = ASCII, 1 = UTF-8.[1]
As far as I know, with the exception of a few Japanese bureaucrats, everybody uses zip implementations that handle non-ASCII properly. InfoZip is one such that is portable, although I don't recall how it handles filesystems with non-Unicode file name encodings. From the point of view of this proposal, just require that filename encodings be properly specified, and provide an option to use the appropriate codec. This isn't too hard. The main thing to rule out is multiple encodings in one file system (yes, I've seen it, but not recently, thank the powers). This could even be handled (on POSIX filesystems) with an auxiliary utility that converts whatever-encoded filenames to UTF-8 (could be a symlink tree). Then you can just require a UTF-8 filesystem throughout the zipapp handling system. Only remaining question in my mind would be backward compatibility with any existing zipapp specs (which I have no idea about, but if I were participating in implementation I'd be sure to check). Footnotes: [1] Or maybe it's 0 = ISO-8859-1, 1 = UTF-8. Sorry, don't have a copy of the spec handy.

Thanks for the ideas, Abdur-Rahmaan! Some feedback below. On Mon, Jan 6, 2020 at 11:35 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
This would be a packaging detail so not something to be specified in the stdlib.
This can be tricky because people want signing in specific ways that vary from OS to OS, case by case. So unless there's a built-in signing mechanism the flexibility required here is huge.
Install the wheels where? You can't do that globally. And you also have to worry about the security of doing the install implicitly. And now the user suddenly has stuff on their file system they may not have asked for as a side-effect which may upset some people who are tight on disk space (remember that Python runs on some low-powered machines). -Brett

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club | github Mauritius On Wed, Jan 8, 2020 at 1:32 AM Brett Cannon <brett@python.org> wrote:
This would be a packaging detail so not something to be specified in the
stdlib. Yes, the module opening the zip will look for it protect the flexibility required here is huge. Let's say we have a simple project folder/ file.py __main__.py The first step is to include in the info file the file name and hashes file.py: 5f22669f6f0ea1cc7e5af7c59712115bcf312e1ceaf7b2b005af259b610cf2da __main__.py: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Then by reading the info file and hashing the actual file and comparing, we can see which file was modified if any. But now, a malicious program might try to modify the info file and modify the hash. One way to protect even the metadata is to hash the entire content folder/ file.py # we can add those in a folder if needed __main__.py infofile Then after zipping it, we hash the zipfile then append the hash to the zip binary [zipfile binary][hash value] We can have a zip file and yet another file stating the hash value but to maintain a single file structure, the one described above is best. Then when opening the zip file, we start reading upto the hash value. The hash value becomes the checking signature of the zipfile. This forms a base on which more sigining mechanism can be added like author keys Since zipfiles are the same across OSes, this kind of approach supposedly don't pose a problem
Yes, global folders also defeat the spirit. Using the wheel-included zip (A), we can generate another zip file (B) with the packages installed. That generated zip file is then executed. Zip format A solves the problem of cross-platforming. Normal solutions upto now like use solution B where you can't share your zips across OSes. As for space, it's a bit the same as with venvs. Zip format B is the equivalent of packages installed in venv. Venv usage can be a hint as to when to use.

On Jan 8, 2020, at 01:09, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
How does this solve the problem? A malicious program that could modify the hash inside the info file could even more easily modify the hash at the end of the zip. Existing systems deal with this by recognizing that you can’t prevent anyone from hashing anything they want, so you either have to store the hashes in a trusted central repo, or (more commonly–there are multiple advantages) sign them with a trustable key. If a malicious app modified the program and modified the hash, it’s going to be a valid hash; there’s nothing you can do about that. But it won’t be the hash in the repo, or it’ll be signed by the untrusted author of the malicious program rather than the trusted author of the app, and that’s why you don’t let it run. And this works just as well for hashes embedded inside an info file inside the zip as for hashes appended to the zip. And there are advantages to putting the hash inside. For example, if you want to allow downstream packagers or automated systems to add distribution info (this is important if you want to be able to pass a second code signing requirement, e.g., Apple’s, as well as the zipapp one), you just have a list of escape patterns that say which files are allowed to be unhashed. Anything that appears in the info file must match its hash or the archive is invalid. Anything that doesn’t appear in the info file but does match the escape patterns is fine, but if it doesn’t match the escape patterns, the archive is invalid. So now downstream distributors can add extra files that match the escape patterns. (The escape patterns can be configurable—you just need them to be specified by something inside the hash. But you definitely want a default that works 99% of the time, because if developers and packagers have to think it through in every case instead of only in exceptional cases, they’re going to get it wrong, and nobody will have any idea who to trust to get it right.)

On Wed, 8 Jan 2020, 21:29 Andrew Barnert, <abarnert@yahoo.com> wrote:
You are right, that's why i said: The hash value becomes the checking signature of the zipfile.
Meaning the hash value at the end of the zipfile becomes the hash by which we identify the file and against which we check. That is for checking the integrity of the app.

On Jan 8, 2020, at 01:09, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
Using the wheel-included zip (A), we can generate another zip file (B) with the packages installed. That generated zip file is then executed.
But that generated zip B doesn’t have a trustable hash on it, so how can you execute it? If you keep this all hidden inside the zipapp system, where malicious programs can’t find and modify the generated zips, then I suppose that’s fine. But at that point, why not just install the wheels inside zip A into an auto-generated only-for-zip-A venv cache directory or something, and then just run zip A as-is against that venv?
You can still only share zips across OSs if you bundle in a wheel for each extension library for every possible platform. For in-house deployments where you only care about two platforms (your dev boxes and your deployment cluster boxes), that’s fine, but for a publicly released app that’s supposed to work “everywhere”, you pretty much have to download and redistribute every wheel on PyPI for every dependency, which could make your app pretty big, and require pretty frequent updates, and it still only lets you run on systems that have wheels for all your dependencies. If you’re already doing an effective “install” step in building zip B out of zip A, why not make that step just use a requirements file and download the dependencies from PyPI? You could still run zip B without being online, just not zip A. Maybe you could optionally include wheels and they’d serve as a micro-repo sitting in front of PyPI, so when you’re dependencies are small you can distribute a version that works for 95% of your potential users without needing to do anything fancy but it still works for the other 5% if they can reach PyPI. (But maybe it would be simpler to just use the zip B as a cache in the first place. If I download Spam.zipapp for Win64 3.9, that’s a popular enough platform that you probably have a zip B version ready to go and just ship me that, so it works immediately. Now, if I copy that file to my Mac instead of downloading it fresh, oops, wrong wheels, so it downloads the right ones off PyPI and builds a new zipapp for my platform—and it still runs, it just takes a bit longer the first time. I’m not sure this is a good idea, but I’m not sure trying to include every wheel for every platform is a good idea either…) But there’s a bigger problem than just distribution. Some extension modules are only extension modules for speed, like numpy. But many are there to interface with C libraries. If my app depends on PortAudio, distributing the extension module as wheels is easy, but it doesn’t do any good unless you have the C library installed and configured on your system. Which you probably don’t if you’re on Windows or Mac. A package manager like Homebrew or Choco can take care of that by just making my app’s package depend on the PortAudio package (and maybe even conda can?), but I don’t see how zipapps with wheels in, or anything else self-contained, can. And if most such packages eventually migrate to binding from Python (using cffi or ctypes) rather than from C (using an extension module), that actually makes your problem harder rather than easier, because now you can’t even tell from outside the code that there are external dependencies; you can distribute a single zipapp that works everywhere, but only in the sense that it starts running and quickly fails with an exception for most users.

On 08/01/2020 18:08, many people wrote lots of stuff... Folks, could we pick one list and have the discussion there, rather than on both python-list and python-ideas? Getting *four* copies of Andrew's emails is a tad distracting :-) -- Rhodri James *-* Kynesim Ltd

On Wed, 8 Jan 2020, 22:08 Andrew Barnert, <abarnert@yahoo.com> wrote:
But that generated zip B doesn’t have a trustable hash on it, so how can you execute it?
The issue of trust is solved by keys, i did not propose something concrete as i'm still looking into a viable scheme If you keep this all hidden inside the zipapp system, where malicious
The env idea is to be retained, the thread was asking where would the cache directory be located. require pretty frequent updates, Well this proposal goes for dependendency freezing. When an app is shipped, the packages are not expected to be updated. The author can ship another version with updated libs but the end user does not worry about packages updates and it still only lets you run on systems that have wheels for all your
If you can have pypi that's just cool, but the idea of using archives trends towards self-contained apps (But maybe it would be simpler to just use the zip B as a cache in the
More ideas, did not consider online, but if we do it's a very nice thing I’m not sure this is a good idea, but I’m not sure trying to include every
wheel for every platform is a good idea either…)
Maybe as Mr. Christopher says, i must bring in some demos But there’s a bigger problem than just distribution. Some extension modules
Oh that's a user problem, it's the same as Twisted requiring some C++ redistribuables on windows. I got the impression that the name twisted was really well named as i found the library to be twisted for installation. We were in the midst of our usergroup webscraping presentation when the demo at hand required to install twisted. Some nasty C++ redistribuable error showed which slowed down the whole session. But that was a user side requirement not a lib side one.

On Jan 8, 2020, at 12:04, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
OK, but I don’t see how any scheme that looks like any of the usual ones could be adapted to work. The whole point of code signing is that I know that you signed the app with a key that nobody else has access to, and nobody has changed the app since then (plus additional stuff, but this is the relevant part). If that new zip B is built on the fly on my machine by normal user software, it can only be signed with a key that’s available to normal user software on my machine. Which includes malicious software that wants to modify and re-sign the zip. (I’m assuming you can’t rely on being online at this point.)
Why is that a problem? Most platforms have a standard location for putting cache directories. Those that don’t, you just have to use something hardcoded. More importantly, how does your solution make anything easier? Bundling the cache back up into another zipfile and then trying to figure out where that zipfile is the same problem as just trying to figure out where to put the venv would have been. It seems like you’re just adding complexity without any benefit. This is why I assumed you might want the platformized “B zip” to be itself redistributable—then you do get some benefit. But maybe there’s some other benefit I’m not seeing?
If you bundled an app with all the wheels that yesterday, and newer wheels are needed to work with the new version of macOS with Enhanced Super Duper Gatekeeper, or the manylinux2021-armv8 that’s recently become a popular platform, or the AIX platform that only 20 people care about so some of the wheels didn’t exist but now they do because 1 of those 20 people wants those libraries, the wheels those people need are on PyPI, but they’re not in your bundle. That’s a solved problem with the current ecosystem, but you’re throwing that solution away, and therefore need to solve it again. Or maybe it’s fine to not solve it. Mac-specific apps often have to be updated when a new macOS comes out, so if platform-agnostic apps also often have to be updated when a new anything comes out, maybe that’s no big deal?
But there’s a bigger problem than just distribution. Some extension modules are only extension modules for speed, like numpy. But many are there to interface with C libraries. If my app depends on PortAudio, distributing the extension module as wheels is easy, but it doesn’t do any good unless you have the C library installed and configured on your system.
Oh that's a user problem,
OK, but it seems like if you’re not solving it, you don’t really have portable apps. An app that can run out of the box on every machine except most Windows systems, or an audio app that runs on every machine but usually only plays audio on Linux, etc., doesn’t seem very portable. Conda, py2exe, py2app, platforms’ package managers, etc. all do solve this problem. Of course most of them don’t do so in a platform-agnostic way, which makes it a lot easier… But still, why would I want to download the zipapp instead of brew install or downloading a Mac-specific py2app app or something else that will definitely work instead of only maybe working and otherwise punting on it as a user problem that I have to figure out how to solve myself? The fact that I can copy that same zipapp to a Windows box and then figure out how to solve the same user problem on a different platform doesn’t seem like a huge win.

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club | github Mauritius On Thu, Jan 9, 2020 at 9:10 AM Andrew Barnert <abarnert@yahoo.com> wrote:
Being online for checking is normally how you do it. Machine-based have the problems you stated. Now you'd be asking why dependencies have to be offline while sigining online. Well pulling dependencies from pip is like a normal python project. The zip advantage would just be a smaller code base. The app-like idea is to just run a file, not worrying about dependencies.
Just a question. Not saying it's a problem.
More importantly, how does your solution make anything easier? Bundling the cache back up into another zipfile and then trying to figure out where that zipfile is
Was proposing the generated zipfile is in the same folder as the original zipfile Another idea is to have a cross-platform code-base only zip. In the info file we can have target os. We need to specify this only in the case of c-based libs. It will then generate the required zips bundled with libs for that os. main zip -> zip for win, zip for mac, zip for linux
Or maybe it’s fine to not solve it. Mac-specific apps often have to be updated when a new macOS comes out, so if platform-agnostic apps also often have to be updated when a new anything comes out, maybe that’s no big deal?
It's on the software author to ship a new release.
What i'm saying is that while it's true that for example a lib is for interfacing with a C library but it's beyond Python to make sure that the C library is actually present on your machine. This is a zipapp enancement which is a bundled format. Native execs on the other hand include in lots of os-specific stuffs that has no relation whatsoever with Python. At this point i need to - See conda - Come up with a viable online signing scheme. According to me machine-based signing is just not worth it. - As Mr. Barry Scott suggested, cover the pros and cons of existing zipapp based solutions - As Mr. Christopher suggested, i need to come up with demos. I'll code the demos .. Of a wheels included zip .. Of a zip that generates Os-specific zips .. Of Mr. Andrew's pypi-based zips

On Thu, Jan 9, 2020 at 7:10 PM Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
So you're offering no real benefits (since you have to be online to verify the app), and you pay the price of bundling everything. Great.
But what's the point? Why not just use pip the way we already can? What is the actual benefit?
Or maybe it’s fine to not solve it. Mac-specific apps often have to be updated when a new macOS comes out, so if platform-agnostic apps also often have to be updated when a new anything comes out, maybe that’s no big deal?
It's on the software author to ship a new release.
Brilliant. So now every software author has to continually maintain the app and monitor all OSes for new releases. What happens if the author isn't on it instantly? What if it takes him/her a couple of months, or even a year or two, to get around to releasing an update?
So far, I have seen zero benefits to this zipapp enhancement. It's not bundling anything new and useful. Instead, you force software authors to create monolithic distribution archives for every combination of Python version and OS flavour (including things like 32-bit vs 64-bit etc) that they want to support - and all for what?
- and figure out what problem you're actually solving here. ChrisA

On Thu, 9 Jan 2020, 12:38 Chris Angelico, <rosuav@gmail.com> wrote:
So you're offering no real benefits (since you have to be online to verify the app), and you pay the price of bundling everything. Great.
If you've read the thread, i'm saying i did not propose a concrete signing solution since i'm still looking into it. Those were some ideas that came with Mr. Andrew's discussion But what's the point? Why not just use pip the way we already can?
What is the actual benefit?
Those concerns should be addressed to the author of PEP441 Brilliant. So now every software author has to continually maintain
If we go that route yes, same as an executable that won't work on a new Mac update So far, I have seen zero benefits to this zipapp enhancement. From reading 3 threads, i get the idea that you don't see the benefits of bundling dependencies in a zipapp - and figure out what problem you're actually solving here.
To quote the FaQ: This proposal is not solving any problem at all -------------------------------------------------------------- This proposal aims at enhancing zipapp. Zipapp solved the problem. Zipapp had an aim. This proposal aims at helping zipapp better accompplish it's aim. This proposal explores the next level of zipapps. The enhancements are 2 folds: - Adding meta details - Bundling dependencies But i choose to go even further by attempting to explore security features and exploring the option of cross-platforming. That's why there is much discussion over it I could've played the safe route and just propose adding meta data and bundle dependencies producing Os-specific zips. Nobody has objection to the two above, there are prototypes with the above features which work. Before i forget about the hard questions completely and just propose the safe part, i wanted to push it as far as i can.

On Thu, 9 Jan 2020 at 11:00, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
But you haven't explained what problem adding metadata would solve.
- Bundling dependencies
You can already bundle (pure Python) dependencies, just use pip install --target to place them in a directory alongside your application, add some code in your app to set sys.path, and bundle the whole lot in a zipapp. Many people do this already. So if what you're proposing is to make that process easier, then great, but you're not explaining things very well, as nothing you've described so far sounds easier than the current process :-(
And yet again, you haven't explained how these additional features will solve problems that users are actually encountering. Sure, it's easy to say "security will avoid problems with malicious code" - but what specific attacks are people finding to be an issue, and how will your proposed solution address them? (You say you're still investigating signing - I'd suggest dropping that part of your proposal for now if you don't know how it will work yet).
That's why there is much discussion over it
There's discussion because no-one can work out what problem you're trying to solve, not because your proposal includes a number of aspects.
You'd still have people asking what problem this will solve...
Possibly. I don't see the point of extra metadata, but I'm not going to object strenuously if someone wants to make it an optional extra that people can include in their zipapps if they want. And if you had a concrete proposal for a tool that made bundling pure-python dependencies easier, I'd be very happy. But such a tool can easily be written as a standalone tool - it doesn't need any change to Python (even the zipapp module in the stdlib could have been released on PyPI and kept independent). I don't see the point of insisting it be added to Python (and indeed, I see some significant downsides to doing so, such as it not being available in older versions of Python...)
Before i forget about the hard questions completely and just propose the safe part, i wanted to push it as far as i can.
Maybe that was a mistake :-) Start small, and then build on your success once the first part is done. Paul

On Thu, Jan 9, 2020 at 3:29 PM Paul Moore <p.f.moore@gmail.com> wrote:
Thanks Mr. Paul Moore, co-author of PEP441 for contributing to the discussion. Enchanté, as you say in French 🎉
But you haven't explained what problem adding metadata would solve.
Writing here at the same time for more points below asking for what problem adding metadata solves. Well to begin with, the Python community still views zip archives as mere zip archives. In the Python Be Bold - Draft thread on the Python list i listed different ways in which zip archives are being used in ways that are more than just archives. I have taken Java as an example (you can refer to the draft here <https://mail.python.org/pipermail/python-list/2020-January/895056.html>) as Python shares some similarities in having a VM, having bytecodes and being labelled as a cross-platform language. The draft shows different ways in which we can improve a mere Zip archive to the level where more ambitious projects might be built. I have also described the signing mechanism of .jars etc Having metadata in zip archives is one baby step on using archives as apps. The current thread being a spinoff of this <https://mail.python.org/pipermail/python-list/2020-January/894987.html> and that <https://mail.python.org/pipermail/python-list/2020-January/895056.html> thread, it is recommended that before coming to this thread, people go through these threads, see the conclusions reached on some aspects. Reading this draft by itself raises many whys which i'll just copy paste to answer
<<Many people do this already>> That's precisely it. Many people do it which shows that there's a need, many tools have been built but this proposal proposes to make dependencies bundling 'official', enabling python to ease the process. As i said earlier: <<there are prototypes with the above features which work.>>
Referring to your below part of "that's your mistake" i think yes it's a good idea
The discussion has been over signing and cross-platforming
Maybe that was a mistake :-) Start small, and then build on your success once the first part is done.
Ok will do!

On Thu, 9 Jan 2020 at 18:15, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
Maybe I'm missing something, but the draft which you link to has a lot of discussion of how jar files and apk files work, and their features, but almost nothing about what problems people using Python currently have, that the proposal could solve. And if I ignore the parts about signing and bundling cross-platform dependencies (which I've already said I think you should drop until the basics are sorted out) there seems to me to be almost nothing left in the way of a concrete proposal. I'd strongly suggest that you re-formulate your proposal as a series of one or more sets of: * Description of a problem that Python users currently face * Review of currently available solutions, and how they fall short * Details of what you propose, and how that improves on the current options At the moment, it's very hard to tell what you're actually suggesting, and what benefits you think will be gained. For what it's worth, I agree that the current options for bundling Python applications are not ideal. But to move forward, I want to see specifics, not just a generalised "things aren't great, we should throw more technology at the problem and it will help (somehow)" proposal with no actual plan of action. Sorry if this sounds negative - I don't mean it to. You sound like you have some ideas here and I'm hoping you can find a way to explain them better so they don't get lost in the confusion. Paul

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Thu, Jan 9, 2020 at 10:53 PM Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 9 Jan 2020 at 18:15, Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
Maybe I'm missing something, but the draft which you link to has a lot of
Not negative at all, i'm going to do it. Further up in the thread Mr. Barry Scott proposed the reviewing of available solutions and further up i put it in the todo list. I'm doing it, just that there's no way of putting a Python thread to "stop mode - author addressing issues". As people talk i think best to reply than them waiting ^^_

On Wed, Jan 8, 2020 at 1:09 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
That's under-specified. What hash algorithm was used? How are you going to specify it?
Then by reading the info file and hashing the actual file and comparing, we can see which file was modified if any.
But then I can modify the signatures of any of these files by regenerating them. Please trust me, this isn't simple to get right, especially if you are shipping the hashes with the file if you're trying to protect tampering versus just verifying a blip in a download.
That actually doesn't work. You cannot load an extension module from memory; it *must* be from disk so this doesn't solve the extension module problem. -Brett

On Wed, 8 Jan 2020, 23:04 Brett Cannon, <brett@python.org> wrote:
That's under-specified. What hash algorithm was used? How are you going to specify it?
That was a sha256 demo. But then I can modify the signatures of any of these files by regenerating
Well i mentionned that The hash value becomes the checking signature of the zipfile. Meaning that it's just a structure to easily verify the integrity of a file in depth. The end hash becomes the verifying signature but since we have the individual hashes as well we can verify which file changed I did not elaborate on signing as i'm still looking into it That actually doesn't work. You cannot load an extension module from
memory; it *must* be from disk so this doesn't solve the extension module problem.
Oh i mean physically generating another zip on disk (zip B) then executing it.

On 6 Jan 2020, at 19:34, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
Note: draft simplified
Please cover the pro's and con's of the alernatives that have been raised as comments on this idea, as is usually done for a PEP style document. Also beware that zip file format does not include the encoding of the files that are in the zip file. This means that for practical purposes only ASCII filenames are portable across systems. Is this limitation a problem for this proposal? Barry

On Tue, 7 Jan 2020, 01:57 Barry Scott, <barry@barrys-emacs.org> wrote:
Thanks, i don't have much experience writing peps and if i don't bug you may i ask what "alternatives" refer to? Also beware that zip file format does not include the encoding of the files
that are in the zip file.
For the encoding of the contents, well since we are packaging python code files, it's handling will be the same as handling outside the zip file. It's handling is the same as how zipapp handles things. This means that for practical purposes only ASCII filenames are portable
across systems. Is this limitation a problem for this proposal?
If we are talking about filenames, then i guess ascii filenames are the way to go as you'd unnecessarily break things otherwise.

I’m a bit unclear on how far this goes: is it just a bit more specific with more meta-data standards? Or are you aiming for something that will run without a Python install? Other issues: Are you aiming for a bundle that can run on multiple platforms? If so, then it’ll have to have a way to bundle multiple compiled extensions and select the right ones at runtime. If this Is essentially just zipapp with the ability to bundle dependencies, then you could probably just do some sys.path hackery. In any case, thus seems like something you could implement, and then see if people find it useful. BTW- I’m pretty sure we could simply specify that filenames are utf-8 and we’d be good to go. -CHB On Mon, Jan 6, 2020 at 5:50 PM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Tue, Jan 7, 2020 at 6:40 AM Christopher Barker <pythonchb@gmail.com> wrote:
I’m a bit unclear on how far this goes: is it just a bit more specific with more meta-data standards?
- More metadata - Integrity check with hashing - Protecting the meta data - Bundling 3rd party packages Or are you aiming for something that will run without a Python install?
Aie aie Mr. Christopher, zipapp requires a Python install Other issues:
According to the discussion on the Python, Be Bold thread, it became clear that it will be a pain to generate and will have an unnecessary size but sure this a most stable idea Suggesting instead to include wheels. The wheels are installed. The interpreter looks for packages in that app-specific folder If this Is essentially just zipapp with the ability to bundle dependencies,
then you could probably just do some sys.path hackery.
Could you please explain more. Thanks? In any case, thus seems like something you could implement, and then see if
people find it useful.
That's a nice idea to have a working demo. I'm not a security expert but i'll try! Anyone interested in this thread can view this tool <https://github.com/linkedin/shiv> built by LinkedIn which attempts dependencies bundling.

On Mon, Jan 6, 2020 at 10:50 PM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
- More metadata
good idea, and simple.
- Integrity check with hashing - Protecting the meta data
This could be a big challenge -- and I'm not expert, so have no idea what the issues are.
- Bundling 3rd party packages
Well, as you state below, that could make it big. but it also could make it useful -- folks want to use environments of various sorts to keep dependencies separate, so bundling them all up in an app would be nice. But a thought on that -- you may be able to accomplish something similar with conda, "conda constructor", and "conda run". -- or a new tool built from those. The idea is that the first time you ran your "app", it would install its dependencies, and then use them in an isolated environment. But if the multiple apps had the same dependencies, they would share them, so you wouldn't get major bloat on the host machine.
but a wheel is just as big as the installed package (at least a zipped version) -- it's essentially the package compressed into a tarball. If this Is essentially just zipapp with the ability to bundle dependencies,
then you could probably just do some sys.path hackery.
Could you please explain more. Thanks?
sure -- in your zip file, you have a "dependencies" directory. the dependencies get installed there. Then that dir gets added to sys.path at startup. I'm not so sure o=how to do that inside a zipfile, but it could be done *somehow* In any case, thus seems like something you could implement, and then see if
well, you'll need a consult on the security issues -- which you would want well reviewed anyway ;-)
There you go -- you've got half the job done already :-) But: "Unlike “conventional” zipapps, shiv packs a site-packages style directory of your tool’s dependencies into the resulting binary, and then at bootstrap time extracts it into a ~/.shiv cache directory." which is how they get around the "how to add a dir in a zip file to sys.path" -- but I'll bet someone could hack that to no be neccesary.... -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, 8 Jan 2020, 11:09 Christopher Barker, <pythonchb@gmail.com> wrote:
I guess it's time to dig more into anaconda, been putting it off, will do. but a wheel is just as big as the installed package (at least a zipped
version) -- it's essentially the package compressed into a tarball.
I really hope C extentions would become redundent someday in Python, which would make Python development real Python dev. The proposal at hand is maybe the best solution to a hard nut case that most if not all solutions preferred to avoid But: "Unlike “conventional” zipapps, shiv packs a site-packages style
directory of your tool’s dependencies into the resulting binary, and then at bootstrap time extracts it into a ~/.shiv cache directory."
Maybe we can have a PYZ directory where the packages for each app are extracted then it's not a global dump but more specific Why not that route? It would be nice to comment on what is wrong with Shiv's mode of execution

On Wed, Jan 8, 2020 at 1:24 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
to be clear -- you want to look at "conda", not "Anaconda" -- conda is a package manager, Anaconda is a distribution created with the conda package manager.
That's not going to completely happen. Which does not mean that a solution that doesn't support them isn't still useful for a lot. But it would be interesting to see how many commonly used packages on PyPi rely on C extensions (other than the SciPy Stack).
I'm not sure how that differs from a .shiv directory, which is not global. But a way to share packages in the "central place for packages" would be nice. -- maybe how conda does it with hard links? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Wed, Jan 8, 2020 at 8:08 PM Christopher Barker <pythonchb@gmail.com> wrote:
Just a quick note on that. A global directory has the side effect mentionned in Shiv's readme:
If you create many utilities with shiv, you may want to occasionally clean this directory.
As with many packages mixed in, you can have unwanted side-effects. Being more specific might be Shiv/ app1/ package1 app2/

Have a look at this write up about the horror that is zip file name handling. https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ This has been a pain point at work. Barry

Barry writes:
Have a look at this write up about the horror that is zip file name handling.
As I implied, I don't need to "read write-ups", *I live the horror.* Not daily, but always when I really don't want to spend the minutes.
This has been a pain point at work.
I know your pain. But this PEP is not about your work environment (does it include Japanese bureaucrats? ;-/), it's about a file format that we control. (More specifically, Mr. Janhangeer does.) The question (in any normal case) is simply how does the system default file name encoding interact with the PEP, and my guess is "if we take a bit of care, it doesn't." Steve

On Wed, Jan 8, 2020 at 1:49 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
I'm pretty sure this is a non-issue for this use-case. If you need to open sip files created by arbitrary other systems, or create zip files that can be opened by arbitrary other systems, then it's a big mess. But that isn't the case here. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

You are offing up a competitor against python wheels, Py2app, py2exe etc packagers. Explain the benefits and weaknesses compared to the existing alternatives. You might want to look at pex that is mentioned in the pep you refer to. The other mentioned app has seen no update sine 2013.
I replies seperaly about this problem.
Barry

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Wed, Jan 8, 2020 at 2:20 AM Barry <barry@barrys-emacs.org> wrote:
You are offing up a competitor against python wheels
This proposal proposes to inlcude python wheels in the archive
Py2app, py2exe etc packagers.
Native executables are off the plate. This one deals with archive files. But i get the idea, thanks! Maybe you wanted to allude to projects like Shiv <https://github.com/linkedin/shiv/> by LinkedIn
Explain the benefits and weaknesses compared to the existing alternatives.
There are some projects similar to Shiv, will write a comparison.

Barry Scott writes:
Also beware that zip file format does not include the encoding of the files that are in the zip file.
The most recent zipfile format, which is now a decade or so old, does specify the encoding, for values of 0 = ASCII, 1 = UTF-8.[1]
As far as I know, with the exception of a few Japanese bureaucrats, everybody uses zip implementations that handle non-ASCII properly. InfoZip is one such that is portable, although I don't recall how it handles filesystems with non-Unicode file name encodings. From the point of view of this proposal, just require that filename encodings be properly specified, and provide an option to use the appropriate codec. This isn't too hard. The main thing to rule out is multiple encodings in one file system (yes, I've seen it, but not recently, thank the powers). This could even be handled (on POSIX filesystems) with an auxiliary utility that converts whatever-encoded filenames to UTF-8 (could be a symlink tree). Then you can just require a UTF-8 filesystem throughout the zipapp handling system. Only remaining question in my mind would be backward compatibility with any existing zipapp specs (which I have no idea about, but if I were participating in implementation I'd be sure to check). Footnotes: [1] Or maybe it's 0 = ISO-8859-1, 1 = UTF-8. Sorry, don't have a copy of the spec handy.

Thanks for the ideas, Abdur-Rahmaan! Some feedback below. On Mon, Jan 6, 2020 at 11:35 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
This would be a packaging detail so not something to be specified in the stdlib.
This can be tricky because people want signing in specific ways that vary from OS to OS, case by case. So unless there's a built-in signing mechanism the flexibility required here is huge.
Install the wheels where? You can't do that globally. And you also have to worry about the security of doing the install implicitly. And now the user suddenly has stuff on their file system they may not have asked for as a side-effect which may upset some people who are tight on disk space (remember that Python runs on some low-powered machines). -Brett

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club | github Mauritius On Wed, Jan 8, 2020 at 1:32 AM Brett Cannon <brett@python.org> wrote:
This would be a packaging detail so not something to be specified in the
stdlib. Yes, the module opening the zip will look for it protect the flexibility required here is huge. Let's say we have a simple project folder/ file.py __main__.py The first step is to include in the info file the file name and hashes file.py: 5f22669f6f0ea1cc7e5af7c59712115bcf312e1ceaf7b2b005af259b610cf2da __main__.py: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Then by reading the info file and hashing the actual file and comparing, we can see which file was modified if any. But now, a malicious program might try to modify the info file and modify the hash. One way to protect even the metadata is to hash the entire content folder/ file.py # we can add those in a folder if needed __main__.py infofile Then after zipping it, we hash the zipfile then append the hash to the zip binary [zipfile binary][hash value] We can have a zip file and yet another file stating the hash value but to maintain a single file structure, the one described above is best. Then when opening the zip file, we start reading upto the hash value. The hash value becomes the checking signature of the zipfile. This forms a base on which more sigining mechanism can be added like author keys Since zipfiles are the same across OSes, this kind of approach supposedly don't pose a problem
Yes, global folders also defeat the spirit. Using the wheel-included zip (A), we can generate another zip file (B) with the packages installed. That generated zip file is then executed. Zip format A solves the problem of cross-platforming. Normal solutions upto now like use solution B where you can't share your zips across OSes. As for space, it's a bit the same as with venvs. Zip format B is the equivalent of packages installed in venv. Venv usage can be a hint as to when to use.

On Jan 8, 2020, at 01:09, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
How does this solve the problem? A malicious program that could modify the hash inside the info file could even more easily modify the hash at the end of the zip. Existing systems deal with this by recognizing that you can’t prevent anyone from hashing anything they want, so you either have to store the hashes in a trusted central repo, or (more commonly–there are multiple advantages) sign them with a trustable key. If a malicious app modified the program and modified the hash, it’s going to be a valid hash; there’s nothing you can do about that. But it won’t be the hash in the repo, or it’ll be signed by the untrusted author of the malicious program rather than the trusted author of the app, and that’s why you don’t let it run. And this works just as well for hashes embedded inside an info file inside the zip as for hashes appended to the zip. And there are advantages to putting the hash inside. For example, if you want to allow downstream packagers or automated systems to add distribution info (this is important if you want to be able to pass a second code signing requirement, e.g., Apple’s, as well as the zipapp one), you just have a list of escape patterns that say which files are allowed to be unhashed. Anything that appears in the info file must match its hash or the archive is invalid. Anything that doesn’t appear in the info file but does match the escape patterns is fine, but if it doesn’t match the escape patterns, the archive is invalid. So now downstream distributors can add extra files that match the escape patterns. (The escape patterns can be configurable—you just need them to be specified by something inside the hash. But you definitely want a default that works 99% of the time, because if developers and packagers have to think it through in every case instead of only in exceptional cases, they’re going to get it wrong, and nobody will have any idea who to trust to get it right.)

On Wed, 8 Jan 2020, 21:29 Andrew Barnert, <abarnert@yahoo.com> wrote:
You are right, that's why i said: The hash value becomes the checking signature of the zipfile.
Meaning the hash value at the end of the zipfile becomes the hash by which we identify the file and against which we check. That is for checking the integrity of the app.

On Jan 8, 2020, at 01:09, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
Using the wheel-included zip (A), we can generate another zip file (B) with the packages installed. That generated zip file is then executed.
But that generated zip B doesn’t have a trustable hash on it, so how can you execute it? If you keep this all hidden inside the zipapp system, where malicious programs can’t find and modify the generated zips, then I suppose that’s fine. But at that point, why not just install the wheels inside zip A into an auto-generated only-for-zip-A venv cache directory or something, and then just run zip A as-is against that venv?
You can still only share zips across OSs if you bundle in a wheel for each extension library for every possible platform. For in-house deployments where you only care about two platforms (your dev boxes and your deployment cluster boxes), that’s fine, but for a publicly released app that’s supposed to work “everywhere”, you pretty much have to download and redistribute every wheel on PyPI for every dependency, which could make your app pretty big, and require pretty frequent updates, and it still only lets you run on systems that have wheels for all your dependencies. If you’re already doing an effective “install” step in building zip B out of zip A, why not make that step just use a requirements file and download the dependencies from PyPI? You could still run zip B without being online, just not zip A. Maybe you could optionally include wheels and they’d serve as a micro-repo sitting in front of PyPI, so when you’re dependencies are small you can distribute a version that works for 95% of your potential users without needing to do anything fancy but it still works for the other 5% if they can reach PyPI. (But maybe it would be simpler to just use the zip B as a cache in the first place. If I download Spam.zipapp for Win64 3.9, that’s a popular enough platform that you probably have a zip B version ready to go and just ship me that, so it works immediately. Now, if I copy that file to my Mac instead of downloading it fresh, oops, wrong wheels, so it downloads the right ones off PyPI and builds a new zipapp for my platform—and it still runs, it just takes a bit longer the first time. I’m not sure this is a good idea, but I’m not sure trying to include every wheel for every platform is a good idea either…) But there’s a bigger problem than just distribution. Some extension modules are only extension modules for speed, like numpy. But many are there to interface with C libraries. If my app depends on PortAudio, distributing the extension module as wheels is easy, but it doesn’t do any good unless you have the C library installed and configured on your system. Which you probably don’t if you’re on Windows or Mac. A package manager like Homebrew or Choco can take care of that by just making my app’s package depend on the PortAudio package (and maybe even conda can?), but I don’t see how zipapps with wheels in, or anything else self-contained, can. And if most such packages eventually migrate to binding from Python (using cffi or ctypes) rather than from C (using an extension module), that actually makes your problem harder rather than easier, because now you can’t even tell from outside the code that there are external dependencies; you can distribute a single zipapp that works everywhere, but only in the sense that it starts running and quickly fails with an exception for most users.

On 08/01/2020 18:08, many people wrote lots of stuff... Folks, could we pick one list and have the discussion there, rather than on both python-list and python-ideas? Getting *four* copies of Andrew's emails is a tad distracting :-) -- Rhodri James *-* Kynesim Ltd

On Wed, 8 Jan 2020, 22:08 Andrew Barnert, <abarnert@yahoo.com> wrote:
But that generated zip B doesn’t have a trustable hash on it, so how can you execute it?
The issue of trust is solved by keys, i did not propose something concrete as i'm still looking into a viable scheme If you keep this all hidden inside the zipapp system, where malicious
The env idea is to be retained, the thread was asking where would the cache directory be located. require pretty frequent updates, Well this proposal goes for dependendency freezing. When an app is shipped, the packages are not expected to be updated. The author can ship another version with updated libs but the end user does not worry about packages updates and it still only lets you run on systems that have wheels for all your
If you can have pypi that's just cool, but the idea of using archives trends towards self-contained apps (But maybe it would be simpler to just use the zip B as a cache in the
More ideas, did not consider online, but if we do it's a very nice thing I’m not sure this is a good idea, but I’m not sure trying to include every
wheel for every platform is a good idea either…)
Maybe as Mr. Christopher says, i must bring in some demos But there’s a bigger problem than just distribution. Some extension modules
Oh that's a user problem, it's the same as Twisted requiring some C++ redistribuables on windows. I got the impression that the name twisted was really well named as i found the library to be twisted for installation. We were in the midst of our usergroup webscraping presentation when the demo at hand required to install twisted. Some nasty C++ redistribuable error showed which slowed down the whole session. But that was a user side requirement not a lib side one.

On Jan 8, 2020, at 12:04, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
OK, but I don’t see how any scheme that looks like any of the usual ones could be adapted to work. The whole point of code signing is that I know that you signed the app with a key that nobody else has access to, and nobody has changed the app since then (plus additional stuff, but this is the relevant part). If that new zip B is built on the fly on my machine by normal user software, it can only be signed with a key that’s available to normal user software on my machine. Which includes malicious software that wants to modify and re-sign the zip. (I’m assuming you can’t rely on being online at this point.)
Why is that a problem? Most platforms have a standard location for putting cache directories. Those that don’t, you just have to use something hardcoded. More importantly, how does your solution make anything easier? Bundling the cache back up into another zipfile and then trying to figure out where that zipfile is the same problem as just trying to figure out where to put the venv would have been. It seems like you’re just adding complexity without any benefit. This is why I assumed you might want the platformized “B zip” to be itself redistributable—then you do get some benefit. But maybe there’s some other benefit I’m not seeing?
If you bundled an app with all the wheels that yesterday, and newer wheels are needed to work with the new version of macOS with Enhanced Super Duper Gatekeeper, or the manylinux2021-armv8 that’s recently become a popular platform, or the AIX platform that only 20 people care about so some of the wheels didn’t exist but now they do because 1 of those 20 people wants those libraries, the wheels those people need are on PyPI, but they’re not in your bundle. That’s a solved problem with the current ecosystem, but you’re throwing that solution away, and therefore need to solve it again. Or maybe it’s fine to not solve it. Mac-specific apps often have to be updated when a new macOS comes out, so if platform-agnostic apps also often have to be updated when a new anything comes out, maybe that’s no big deal?
But there’s a bigger problem than just distribution. Some extension modules are only extension modules for speed, like numpy. But many are there to interface with C libraries. If my app depends on PortAudio, distributing the extension module as wheels is easy, but it doesn’t do any good unless you have the C library installed and configured on your system.
Oh that's a user problem,
OK, but it seems like if you’re not solving it, you don’t really have portable apps. An app that can run out of the box on every machine except most Windows systems, or an audio app that runs on every machine but usually only plays audio on Linux, etc., doesn’t seem very portable. Conda, py2exe, py2app, platforms’ package managers, etc. all do solve this problem. Of course most of them don’t do so in a platform-agnostic way, which makes it a lot easier… But still, why would I want to download the zipapp instead of brew install or downloading a Mac-specific py2app app or something else that will definitely work instead of only maybe working and otherwise punting on it as a user problem that I have to figure out how to solve myself? The fact that I can copy that same zipapp to a Windows box and then figure out how to solve the same user problem on a different platform doesn’t seem like a huge win.

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club | github Mauritius On Thu, Jan 9, 2020 at 9:10 AM Andrew Barnert <abarnert@yahoo.com> wrote:
Being online for checking is normally how you do it. Machine-based have the problems you stated. Now you'd be asking why dependencies have to be offline while sigining online. Well pulling dependencies from pip is like a normal python project. The zip advantage would just be a smaller code base. The app-like idea is to just run a file, not worrying about dependencies.
Just a question. Not saying it's a problem.
More importantly, how does your solution make anything easier? Bundling the cache back up into another zipfile and then trying to figure out where that zipfile is
Was proposing the generated zipfile is in the same folder as the original zipfile Another idea is to have a cross-platform code-base only zip. In the info file we can have target os. We need to specify this only in the case of c-based libs. It will then generate the required zips bundled with libs for that os. main zip -> zip for win, zip for mac, zip for linux
Or maybe it’s fine to not solve it. Mac-specific apps often have to be updated when a new macOS comes out, so if platform-agnostic apps also often have to be updated when a new anything comes out, maybe that’s no big deal?
It's on the software author to ship a new release.
What i'm saying is that while it's true that for example a lib is for interfacing with a C library but it's beyond Python to make sure that the C library is actually present on your machine. This is a zipapp enancement which is a bundled format. Native execs on the other hand include in lots of os-specific stuffs that has no relation whatsoever with Python. At this point i need to - See conda - Come up with a viable online signing scheme. According to me machine-based signing is just not worth it. - As Mr. Barry Scott suggested, cover the pros and cons of existing zipapp based solutions - As Mr. Christopher suggested, i need to come up with demos. I'll code the demos .. Of a wheels included zip .. Of a zip that generates Os-specific zips .. Of Mr. Andrew's pypi-based zips

On Thu, Jan 9, 2020 at 7:10 PM Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
So you're offering no real benefits (since you have to be online to verify the app), and you pay the price of bundling everything. Great.
But what's the point? Why not just use pip the way we already can? What is the actual benefit?
Or maybe it’s fine to not solve it. Mac-specific apps often have to be updated when a new macOS comes out, so if platform-agnostic apps also often have to be updated when a new anything comes out, maybe that’s no big deal?
It's on the software author to ship a new release.
Brilliant. So now every software author has to continually maintain the app and monitor all OSes for new releases. What happens if the author isn't on it instantly? What if it takes him/her a couple of months, or even a year or two, to get around to releasing an update?
So far, I have seen zero benefits to this zipapp enhancement. It's not bundling anything new and useful. Instead, you force software authors to create monolithic distribution archives for every combination of Python version and OS flavour (including things like 32-bit vs 64-bit etc) that they want to support - and all for what?
- and figure out what problem you're actually solving here. ChrisA

On Thu, 9 Jan 2020, 12:38 Chris Angelico, <rosuav@gmail.com> wrote:
So you're offering no real benefits (since you have to be online to verify the app), and you pay the price of bundling everything. Great.
If you've read the thread, i'm saying i did not propose a concrete signing solution since i'm still looking into it. Those were some ideas that came with Mr. Andrew's discussion But what's the point? Why not just use pip the way we already can?
What is the actual benefit?
Those concerns should be addressed to the author of PEP441 Brilliant. So now every software author has to continually maintain
If we go that route yes, same as an executable that won't work on a new Mac update So far, I have seen zero benefits to this zipapp enhancement. From reading 3 threads, i get the idea that you don't see the benefits of bundling dependencies in a zipapp - and figure out what problem you're actually solving here.
To quote the FaQ: This proposal is not solving any problem at all -------------------------------------------------------------- This proposal aims at enhancing zipapp. Zipapp solved the problem. Zipapp had an aim. This proposal aims at helping zipapp better accompplish it's aim. This proposal explores the next level of zipapps. The enhancements are 2 folds: - Adding meta details - Bundling dependencies But i choose to go even further by attempting to explore security features and exploring the option of cross-platforming. That's why there is much discussion over it I could've played the safe route and just propose adding meta data and bundle dependencies producing Os-specific zips. Nobody has objection to the two above, there are prototypes with the above features which work. Before i forget about the hard questions completely and just propose the safe part, i wanted to push it as far as i can.

On Thu, 9 Jan 2020 at 11:00, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
But you haven't explained what problem adding metadata would solve.
- Bundling dependencies
You can already bundle (pure Python) dependencies, just use pip install --target to place them in a directory alongside your application, add some code in your app to set sys.path, and bundle the whole lot in a zipapp. Many people do this already. So if what you're proposing is to make that process easier, then great, but you're not explaining things very well, as nothing you've described so far sounds easier than the current process :-(
And yet again, you haven't explained how these additional features will solve problems that users are actually encountering. Sure, it's easy to say "security will avoid problems with malicious code" - but what specific attacks are people finding to be an issue, and how will your proposed solution address them? (You say you're still investigating signing - I'd suggest dropping that part of your proposal for now if you don't know how it will work yet).
That's why there is much discussion over it
There's discussion because no-one can work out what problem you're trying to solve, not because your proposal includes a number of aspects.
You'd still have people asking what problem this will solve...
Possibly. I don't see the point of extra metadata, but I'm not going to object strenuously if someone wants to make it an optional extra that people can include in their zipapps if they want. And if you had a concrete proposal for a tool that made bundling pure-python dependencies easier, I'd be very happy. But such a tool can easily be written as a standalone tool - it doesn't need any change to Python (even the zipapp module in the stdlib could have been released on PyPI and kept independent). I don't see the point of insisting it be added to Python (and indeed, I see some significant downsides to doing so, such as it not being available in older versions of Python...)
Before i forget about the hard questions completely and just propose the safe part, i wanted to push it as far as i can.
Maybe that was a mistake :-) Start small, and then build on your success once the first part is done. Paul

On Thu, Jan 9, 2020 at 3:29 PM Paul Moore <p.f.moore@gmail.com> wrote:
Thanks Mr. Paul Moore, co-author of PEP441 for contributing to the discussion. Enchanté, as you say in French 🎉
But you haven't explained what problem adding metadata would solve.
Writing here at the same time for more points below asking for what problem adding metadata solves. Well to begin with, the Python community still views zip archives as mere zip archives. In the Python Be Bold - Draft thread on the Python list i listed different ways in which zip archives are being used in ways that are more than just archives. I have taken Java as an example (you can refer to the draft here <https://mail.python.org/pipermail/python-list/2020-January/895056.html>) as Python shares some similarities in having a VM, having bytecodes and being labelled as a cross-platform language. The draft shows different ways in which we can improve a mere Zip archive to the level where more ambitious projects might be built. I have also described the signing mechanism of .jars etc Having metadata in zip archives is one baby step on using archives as apps. The current thread being a spinoff of this <https://mail.python.org/pipermail/python-list/2020-January/894987.html> and that <https://mail.python.org/pipermail/python-list/2020-January/895056.html> thread, it is recommended that before coming to this thread, people go through these threads, see the conclusions reached on some aspects. Reading this draft by itself raises many whys which i'll just copy paste to answer
<<Many people do this already>> That's precisely it. Many people do it which shows that there's a need, many tools have been built but this proposal proposes to make dependencies bundling 'official', enabling python to ease the process. As i said earlier: <<there are prototypes with the above features which work.>>
Referring to your below part of "that's your mistake" i think yes it's a good idea
The discussion has been over signing and cross-platforming
Maybe that was a mistake :-) Start small, and then build on your success once the first part is done.
Ok will do!

On Thu, 9 Jan 2020 at 18:15, Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
Maybe I'm missing something, but the draft which you link to has a lot of discussion of how jar files and apk files work, and their features, but almost nothing about what problems people using Python currently have, that the proposal could solve. And if I ignore the parts about signing and bundling cross-platform dependencies (which I've already said I think you should drop until the basics are sorted out) there seems to me to be almost nothing left in the way of a concrete proposal. I'd strongly suggest that you re-formulate your proposal as a series of one or more sets of: * Description of a problem that Python users currently face * Review of currently available solutions, and how they fall short * Details of what you propose, and how that improves on the current options At the moment, it's very hard to tell what you're actually suggesting, and what benefits you think will be gained. For what it's worth, I agree that the current options for bundling Python applications are not ideal. But to move forward, I want to see specifics, not just a generalised "things aren't great, we should throw more technology at the problem and it will help (somehow)" proposal with no actual plan of action. Sorry if this sounds negative - I don't mean it to. You sound like you have some ideas here and I'm hoping you can find a way to explain them better so they don't get lost in the confusion. Paul

Yours, Abdur-Rahmaan Janhangeer pythonmembers.club <http://www.pythonmembers.club/> | github <https://github.com/Abdur-rahmaanJ> Mauritius On Thu, Jan 9, 2020 at 10:53 PM Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 9 Jan 2020 at 18:15, Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
Maybe I'm missing something, but the draft which you link to has a lot of
Not negative at all, i'm going to do it. Further up in the thread Mr. Barry Scott proposed the reviewing of available solutions and further up i put it in the todo list. I'm doing it, just that there's no way of putting a Python thread to "stop mode - author addressing issues". As people talk i think best to reply than them waiting ^^_

On Wed, Jan 8, 2020 at 1:09 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
That's under-specified. What hash algorithm was used? How are you going to specify it?
Then by reading the info file and hashing the actual file and comparing, we can see which file was modified if any.
But then I can modify the signatures of any of these files by regenerating them. Please trust me, this isn't simple to get right, especially if you are shipping the hashes with the file if you're trying to protect tampering versus just verifying a blip in a download.
That actually doesn't work. You cannot load an extension module from memory; it *must* be from disk so this doesn't solve the extension module problem. -Brett

On Wed, 8 Jan 2020, 23:04 Brett Cannon, <brett@python.org> wrote:
That's under-specified. What hash algorithm was used? How are you going to specify it?
That was a sha256 demo. But then I can modify the signatures of any of these files by regenerating
Well i mentionned that The hash value becomes the checking signature of the zipfile. Meaning that it's just a structure to easily verify the integrity of a file in depth. The end hash becomes the verifying signature but since we have the individual hashes as well we can verify which file changed I did not elaborate on signing as i'm still looking into it That actually doesn't work. You cannot load an extension module from
memory; it *must* be from disk so this doesn't solve the extension module problem.
Oh i mean physically generating another zip on disk (zip B) then executing it.
participants (10)
-
Abdur-Rahmaan Janhangeer
-
Andrew Barnert
-
Barry
-
Barry Scott
-
Brett Cannon
-
Chris Angelico
-
Christopher Barker
-
Paul Moore
-
Rhodri James
-
Stephen J. Turnbull