A new, experimental packaging tool: distil

I've created a new tool called distil which I'm using to experiment with packaging functionality. Overview -------- It's based on distlib and has IMO some interesting features. With it, one can: * Install projects from PyPI and wheels (see PEP 427). Distil does not invoke setup.py, so projects that do significant computation in setup.py may not be installable by distil. However, a large number of projects on PyPI *can* be installed, and dependencies are detected, downloaded and installed. For those distributions that absolutely *have* to run setup.py, distil can create wheels using pip as a helper, and then install from those wheels. * Optionally upgrade installed distributions, whether installed by distil or installed by pip. * Uninstall distributions installed by distil or pip. * Build source distributions in .tar.gz, .tar.bz2, and .zip formats. * Build binary distributions in wheel format. These can be pure-Python, or have C libraries and extensions. Support for Cython and Fortran (using f2py) is possible, though currently distil cannot install Cython or Numpy directly because of how they use setup.py. * Run tests on built distributions. * Register projects on PyPI. * Upload distributions to PyPI. * Upload documentation to http://pythonhosted.org/. * Display dependencies of a distribution - either as a list of what would be downloaded (and a suggested download order), or in Graphviz format suitable for conversion to an image. Getting started is simple (documentation is at [2]): * Very simple deployment - just copy distil.py[1] to a location on your path, optionally naming it to distil on POSIX platforms. There's no need to install distlib - it's all included. * Uses either a system Python or one in a virtual environment, but by default installs to the user site rather than system Python library locations. * Offers tab-completion and abbreviation of commands and parameters on Bash-compatible shells. Logically, packaging activities can be divided into a number of categories or roles: * Archiver - builds source distributions from a source tree * Builder - builds binary distributions from source * Installer - installs source or binary distributions This version of distil incorporates (for convenience) all of the above roles. There is a school of thought which says that that these roles should be fulfilled by separate programs, and that's fine for production quality tools - it's just more convenient for now to have everything in one package for an experimental tool like distil. Actual Improvements ------------------- Despite the fact that distil is in an alpha stage of development and has received no real-world exposure like the existing go-to packaging tools, it does offer some improvements over them: * Dependency resolution can be performed without downloading any distributions. Unlike e.g. pip, you are told which additional dependencies will be downloaded and installed, before any download occurs. * Better information is stored for uninstallation. This allows better feedback to be given to users during uninstallation. * Dependency checking is done during uninstallation. Say you've installed a distribution A, which pulled in dependencies B and C. If you request an uninstallation of B (or C), distil will complain that you can't do this because A needs it. When you uninstall A, you are offered the option to uninstall B and C as well (assuming you didn't install something else that depends on B or C, after installing A). * By default, installation is to the user site and not to the system Python, so you shouldn't need to invoke sudo to install distributions for personal use which are not for specific projects/virtual environments. * There's no need to "install" distil - the exact same script will run with any system Python or any venv (subject to Python version constraints of 2.6, 2.7, 3.2 or greater). Bootstrapping pip ----------------- I've used distil to bootstrap pip, then used that pip to install other stuff. I created a fresh PEP 405 venv with nothing in it, used distil to install a wheel[3] for my distribute fork which runs on Python 2.x and 3.x, then used distil to install pip from PyPI. Finally, to test pip, I installed SQLAlchemy (using pip) from PyPI. See [4] for the transcript. I would welcome any feedback you could give regarding distil/distlib. There is of course a lot more testing to do, but I consider these initial findings to be promising, and worth sharing. If you find any problems, you can raise issues at [5]. Regards, Vinay Sajip [1] https://bitbucket.org/vinay.sajip/docs-distil/downloads/distil.py [2] https://pythonhosted.org/distil/ [3] https://bitbucket.org/vinay.sajip/distribute3/downloads/ [4] https://gist.github.com/vsajip/5243936 [5] https://bitbucket.org/vinay.sajip/distlib/issues/new

On 26 March 2013 08:54, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
I've created a new tool called distil which I'm using to experiment with packaging functionality.
Interesting! Is the source available anywhere? (Not distil.py, but the source of the big chunk of embedded zipfile that appears to contain the bulk of the functionality...) No big deal if not, I can easily enough unpack the data from distil.py Paul

On Tue, Mar 26, 2013 at 9:54 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
I've created a new tool called distil which I'm using to experiment with packaging functionality. Nice!
* Very simple deployment - just copy distil.py[1] to a location on your path, optionally naming it to distil on POSIX platforms. There's no need to install distlib - it's all included. I see that you are using a pattern similar to the virtualenv.py script, embedding other code as a compressed byte array. See how virtualenv.py is turning out to be lately: https://github.com/pypa/virtualenv/blob/11ccab2698274f0e10b72da863f9efb73cf1...
I am in general fine with the approach, though I feel a bit uncomfy with this approach creeping in as "the" way to bootstrap things with one single file for core distribution-related tools. Would anyone know of a better way to package things in a single python-executable bootstrapping script file without obfuscating the source contents in compressed/encoded/obfuscated byte arrays? Also, in your code calling this binary payload STUFF feels a tad scary: this is arbitrary code that I cannot see nor inspect before running. I would not want to run unknown STUFFs on my machine ... and even more so since the corresponding sources are not available publicly yet in a source repo. At the minimum, getting some comments or explicit variable names the virtualenv way on what this payload is would help IMHO: "STUFF = """ eJyEm1OMLlC3Zb+ybbtO1Snbtm3brjpl27Zt27Zt27b6Tye3b9J9k37YK9kv+2FmPMxkrC0vBQKKCgA AIAGIUeqCCqFgEh/4AACRBQCAD8AFGFs4OVtbGNLpGRoYWdnbOTrTObk7GdnZmlqY0dq7qyhDAUDqZP ......" -- Cordially Philippe Ombredanne

On 26 March 2013 09:49, Philippe Ombredanne <pombredanne@nexb.com> wrote:
Would anyone know of a better way to package things in a single python-executable bootstrapping script file without obfuscating the source contents in compressed/encoded/obfuscated byte arrays?
Packaging as a zip file is a good way - but on Windows the file needs to be named xxx.py (which is surprising, to say the least :-)) for the relevant file association to be triggered (and on Unix, a #! line needs to be prepended). Windows users could define additional associations (pyz and pywz) for "zipped Python applications". Maybe the Python installer should include these in 3.4+, to improve the visibility of this approach. Paul.

Philippe Ombredanne <pombredanne <at> nexb.com> writes:
I see that you are using a pattern similar to the virtualenv.py script, embedding other code as a compressed byte array.
I am in general fine with the approach, though I feel a bit uncomfy with this approach creeping in as "the" way to bootstrap things with one single file for core distribution-related tools.
It's not a particularly new approach - it's just that way because it makes things easier for the user. If I had used a more conventional approach, I'm not sure as many people would be willing to try it. Like virtualenv, it's a tool that cannot rely on the presence of existing installation tools.
Would anyone know of a better way to package things in a single python-executable bootstrapping script file without obfuscating the source contents in compressed/encoded/obfuscated byte arrays?
It's only obfuscated as a side-effect - the other way would be to put all your code in a single module - not much fun to maintain, that way. But if someone has a better way, that would certainly be of interest.
Also, in your code calling this binary payload STUFF feels a tad scary: this is arbitrary code that I cannot see nor inspect before running.
Would you find it more trustworthy if it was called TRUST_ME_ITS_SAFE? ;-) Remember, it's just a Python script running without system privileges.
I would not want to run unknown STUFFs on my machine ... and even more so since the corresponding sources are not available publicly yet in a source repo.
The absence of a public source repo is a red herring. If you want to inspect the code, the time taken to add a pdb breakpoint after the .zip write, and to unzip the file to a folder of your choice (or to add code to distil.py to do this), is trivial compared to the time you would spend doing the actual code inspection. The code is open to inspection, but I'd hope that most users focus on whether the tool has useful qualities, how it could be used to move packaging forwards, what it demonstrates about distlib etc. Have you inspected setuptools or pip code to verify that they are safe? As well as everything you've ever downloaded from PyPI, which might or might not be exactly the same as what's shown in a project's public VCS repo?
At the minimum, getting some comments or explicit variable names the virtualenv way on what this payload is would help IMHO:
"STUFF = """ eJyEm1OMLlC3Zb+ybbtO1Snbtm3brjpl27Zt27Zt27b6Tye3b9J9k37YK9kv+2FmPMxkrC0vBQKKCgA AIAGIUeqCCqFgEh/4AACRBQCAD8AFGFs4OVtbGNLpGRoYWdnbOTrTObk7GdnZmlqY0dq7qyhDAUDqZP ......"
While virtualenv has a number of discrete files, I have just one zip file containing distlib, CLI support code and distil code - that's a lot of files, so I'm not sure a comment would be all that helpful. What would it really tell you? What "STUFF" is really saying, to most users, is "stuff you don't need to care about the details of". For the security-conscious, a mere comment from a potentially untrusted source is no substitute for that unzip + time-consuming code inspection. Regards, Vinay Sajip

On 26 March 2013 08:54, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
I would welcome any feedback you could give regarding distil/distlib. There is of course a lot more testing to do, but I consider these initial findings to be promising, and worth sharing. If you find any problems, you can raise issues at [5].
A couple of immediate points. I tried "distil install distribute pip wheel" which failed, because distribute requires 2to3 to be run as part of setup.py (no real surprise there). But distil *did* partially install wheel, leaving a broken installation (there was no METADATA filein wheel's dist-info directory). I had to manually delete what had been installed of the 3 projects. I'd suggest that distil needs to roll back anything it did after a failed install. Secondly, when there is a C extension in the distribution (on Windows) the install fails even though I have Visual C installed. This is because cl.exe is not on my PATH - distil should do the same detection of the location of Visual C as distutils does. The install does work if cl.exe is on PATH - presumably, though, it doesn't check that it is the *right* cl.exe (2010 for Python 3.3, 2008 for 2007, etc). Also distil doesn't deal with packages with optional C extensions - but again, that's a case of a "too complex" setup.py (and I'm glad it picks the option of installing the C extension in that case, and not just the pure Python version). But other than these niggles, it's impressively effective so far :-) Paul. PS I'm not entirely happy with the default of installing to the user packages directory. 99.9% of my time, I'm installing into a virtualenv, and this default is very wrong - as the installed packages will "infect" all of my virtualenvs.

On 26 March 2013 08:54, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
I would welcome any feedback you could give regarding distil/distlib. There is of course a lot more testing to do, but I consider these initial findings to be promising, and worth sharing. If you find any problems, you can raise issues at [5].
One other (slight) oddity. I installed coverage into an empty virtuwlenv based on Python 3.3. It installed coverage-2.7 and coverage2 executables into Scripts. Why 2.7? Where did it get the idea that this was a Python 2.7 installation? I ran distil with the python 3.3 that is installed in the virtualenv. Paul

Paul Moore <p.f.moore <at> gmail.com> writes:
I installed coverage into an empty virtuwlenv based on Python 3.3.
It installed coverage-2.7 and coverage2 executables into Scripts. Why 2.7? Where did it get the idea that this was a Python 2.7 installation? I ran distil with the python 3.3 that is installed in the virtualenv.
Ah. The metadata (see [1] for an example) mentions "coverage-2.7" as a script, as it was built on 2.7. That shouldn't really be in the metadata - there should be a single declaration, which is used by distlib/distil to create version-specific aliases. I've now removed it from the metadata from 3.6, you could try again using distil install "coverage (3.6)" to make sure you pick up the version I changed. Regards, Vinay Sajip [1] http://www.red-dove.com/pypi/projects/C/coverage/package-3.6b3.json

On 26 March 2013 10:57, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Ah. The metadata (see [1] for an example) mentions "coverage-2.7" as a script, as it was built on 2.7. That shouldn't really be in the metadata - there should be a single declaration, which is used by distlib/distil to create version-specific aliases.
I've now removed it from the metadata from 3.6, you could try again using
distil install "coverage (3.6)"
to make sure you pick up the version I changed.
Yes, now it just installs coverage.exe. (No coverage-3.3.exe? Not that it bothers me, I don't use the version-specific script wrappers anyway).
[1] http://www.red-dove.com/pypi/projects/C/coverage/package-3.6b3.json
So distil (or is it distlib?) uses metadata from www.red-dove.com as well as PyPI? That's a bit of a surprise. I presume this is a short-term fix, what's the longer-term plan for getting such metadata onto PyPI? Paul.

Paul Moore <p.f.moore <at> gmail.com> writes:
Yes, now it just installs coverage.exe. (No coverage-3.3.exe? Not that it bothers me, I don't use the version-specific script wrappers anyway).
So distil (or is it distlib?) uses metadata from www.red-dove.com as well as PyPI? That's a bit of a surprise. I presume this is a short-term fix, what's the longer-term plan for getting such metadata onto PyPI?
Yes, it's a short-term fix because otherwise it would be no better than pip in the dependency resolution department: download each dist, run egg_info, look for dependencies, download them, rinse and repeat. I'd love to get this metadata onto PyPI, but that depends on the PyPI folks + for the metadata to be accepted as a format. For it to be proven as a useful format, it needs wider exposure ... so that's what I'm hoping for. No doubt the wider exposure will lead to improvements. You can think of the red-dove.com location as just a sort of unofficial early version of what could be on PyPI, if the relevant people agree it's useful. Regards, Vinay Sajip

Paul Moore <p.f.moore <at> gmail.com> writes:
A couple of immediate points. I tried "distil install distribute pip wheel" which failed, because distribute requires 2to3 to be run as part of setup.py (no real surprise there). But distil *did* partially install wheel, leaving a broken installation (there was no METADATA filein wheel's dist-info directory). I had to manually delete what had been installed of the 3 projects. I'd suggest that distil needs to roll back anything it did after a failed install.
There is code in distil to roll back when installation fails, but there could be a bug which prevents it kicking in. I'll investigate. Distil does invoke 2to3 automatically if the metadata indicates it, but the metadata for distribute might be wrong if it was built on a 2.x system. I'll investigate this.
Secondly, when there is a C extension in the distribution (on Windows) the install fails even though I have Visual C installed. This is because cl.exe is not on my PATH - distil should do the same detection of the location of Visual C as distutils does. The install does work if cl.exe is on PATH - presumably, though, it doesn't check that it is the *right* cl.exe (2010 for Python 3.3, 2008 for 2007, etc). Also distil doesn't deal with packages with optional C extensions - but again, that's a case of a "too complex" setup.py (and I'm glad it picks the option of installing the C extension in that case, and not just the pure Python version).
distil could certainly be improved in this area, but the documentation [1] mentions that C builds should be run in a Visual Studio command window. The checking for the right version of Visual Studio is for a little later, but it's on my list.
PS I'm not entirely happy with the default of installing to the user packages directory. 99.9% of my time, I'm installing into a virtualenv, and this default is very wrong - as the installed packages will "infect" all of my virtualenvs.
In what way does "distil -e <venv> install distname" fall short of your expectations? If you have a venv activated, it should install in there - does it not do this? Regards, Vinay Sajip [1] http://pythonhosted.org/distil/installing.html#distributions-which-include-c...

Paul Moore <p.f.moore <at> gmail.com> writes:
A couple of immediate points. I tried "distil install distribute pip wheel" which failed, because distribute requires 2to3 to be run as part of setup.py (no real surprise there). But distil *did* partially install wheel, leaving a broken installation (there was no METADATA filein wheel's dist-info directory). I had to manually delete what had been installed of the 3 projects. I'd suggest that distil needs to roll back anything it did after a failed install.
Another problem with distribute is that you can't install it directly off PyPI with distil, because it does stuff in setup.py in a post-installation step. You will need to use the special wheel I created [1], as I mentioned in my initial post where I showed how to bootstrap pip (or rather, linked to a Gist that shows it being done). Regards, Vinay Sajip [1] https://bitbucket.org/vinay.sajip/distribute3/downloads/
participants (4)
-
Lennart Regebro
-
Paul Moore
-
Philippe Ombredanne
-
Vinay Sajip