PEX at Twitter (re: PEX - Twitter's multi-platform executable archive format for Python)
This is in response to Vinay's thread but since I wasn't subscribed to distutils-sig, I couldn't easily respond directly to it. Vinay's right, the technology here isn't revolutionary but what's notable is that we've been using it in production for almost 3 years at Twitter. It's also been open-sourced for a couple years at https://github.com/twitter/commons/tree/master/src/python/twitter/common/pyt... not widely announced (it is, after all, just a small subdirectory in a fairly large mono-repo, and was only recently published independently to PyPI as twitter.common.python.) PEX files are just executable zip files with hashbangs containing a carefully constructed __main__.py and a PEX-INFO, which is json-encoded dictionary describing how to scrub and bootstrap sys.path and the like. They work equally well unpacked into a standalone directory. In practice PEX files are simultaneously our replacement for virtualenv and also our way of distributing Python applications to production. Now we could use virtualenv to do this but it's hard to argue with a deployment process that is literally "cp". Furthermore, most of our machines don't have compiler toolchains or external network access, so hermetically sealing all dependencies once at build time (possibly for multiple platforms since all developers use Macs) has huge appeal. This is even more important at Twitter where it's common to run a dozen different Python applications on the same box at the same time, some using 2.6, some 2.7, some PyPy, but all with varying versions of underlying dependencies. Speaking to recent distutils-sig threads, we used to go way out of our way to never hit disk (going so far as building our own .egg packager and pure python recursive zipimport implementation so that we could import from eggs within zips, write ephemeral .so's to dlopen and unlink) but we've since moved away from that position for simplicity's sake. For practical reasons we've always needed "not zip-safe" PEX files where all code is written to disk prior to execution (ex: legacy Django applications that have __file__-relative business logic) so we decided to just throw away the magical zipimport stuff and embrace using disk as a pure cache. This seems more compatible philosophically with the direction wheels are going for example. Since there's been more movement in the PEP space recently, we've been evolving PEX in order to be as standards-compliant as possible, which is why I've been more visible recently re: talks, .whl support and the like. I'd also love to chat about more about PEX and how it relates to things like PEP 441 and/or other attempts like pyzzer. cheers, brian twitter.com/wickman
Thanks for the info.
For practical reasons we've always needed "not zip-safe" PEX files where all code is written to disk prior to execution (ex: legacy Django applications that have __file__-relative business logic) so we decided to just throw away the magical zipimport stuff and embrace using disk as a pure cache.
I take it this is because of dependencies (whether your own legacy code or external dependencies) that haven't been designed to run from zip, and where it's not worth it (or otherwise not possible) to re-engineer to run from zip? If there were other reasons (apart from the need to maintain the magic machinery, or problems with it), can you say what they were? Also, is your production usage confined to POSIX platforms, or does Windows have any role to play? If so, did you run into any particular issues relating to Windows that might not be common knowledge? I certainly find single-file executables containing Python code a boon and hope they play a bigger part in how Python software is distributed. I will be looking into PEX for some good ideas :-) Regards, Vinay Sajip
On Fri, Jan 31, 2014 at 3:52 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Thanks for the info.
For practical reasons we've always needed "not zip-safe" PEX files where all code is written to disk prior to execution (ex: legacy Django applications that have __file__-relative business logic) so we decided to just throw away the magical zipimport stuff and embrace using disk as a pure cache.
I take it this is because of dependencies (whether your own legacy code or external dependencies) that haven't been designed to run from zip, and where it's not worth it (or otherwise not possible) to re-engineer to run from zip? If there were other reasons (apart from the need to maintain the magic machinery, or problems with it), can you say what they were?
Also, is your production usage confined to POSIX platforms, or does Windows have any role to play? If so, did you run into any particular issues relating to Windows that might not be common knowledge?
I certainly find single-file executables containing Python code a boon and hope they play a bigger part in how Python software is distributed. I will be looking into PEX for some good ideas :-)
Regards,
Vinay Sajip
PEP 441 is mostly about promoting the idea of archives with a #! module and giving them an extension. It is neat to be aware of PEX and look forward to some better tooling.
For practical reasons we've always needed "not zip-safe" PEX files where all code is written to disk prior to execution (ex: legacy Django applications that have __file__-relative business logic) so we decided to just throw away the magical zipimport stuff and embrace using disk as a pure cache.
I take it this is because of dependencies (whether your own legacy code or external dependencies) that haven't been designed to run from zip, and where it's not worth it (or otherwise not possible) to re-engineer to run from zip? If there were other reasons (apart from the need to maintain the magic machinery, or problems with it), can you say what they were?
The two major reasons here are interpreter compatibility and minimizing what we felt to be hacks, which falls under "maintain the magic machinery" I suppose. There are myriad other practical reasons. Here are some: Our recursive zipimporter was not very portable. We relied upon things like a mutable zipimport._zip_directory_cache but this was both undocumented and subject to change (and was always immutable in PyPy, so we couldn't support it.) Code that touches _zip_directory_cache has since been removed in pkg_resources (that change also broke us, around distribute 0.6.42.) Things like the change from find_in_zip to find_in_zips broke us just a few weeks ago (setuptools 1.1.7 => 2.x.) Our monkeypatches to fix bugs (or features, who knows?) exposed in setuptools by our recursive zipimporter e.g. https://bitbucket.org/tarek/distribute/issue/274/disparity-in-metadata-resou... stopped working over time, since they were patching undocumented parts of the API. This has only accelerated recently since more attention has been drawn to that code by PEPs 426/440 etc. Even today PyPy has bugs with large collections of zipimported namespace packages regardless of zipimport implementation. We eventually surmounted some of these issues, but rather than be doomed to repeat them in the future, we decided to ditch our custom importer instead and rely upon the one in the stdlib when possible. We also wrote our own stubs for .so files so that they could be extracted from nested zips: https://github.com/twitter/commons/blob/677780ea56af610359b3ccb30accb5b9ecd7... This turns out to be fairly memory intensive. One of our most production-critical Python applications, the Thermos executor component of Aurora (see https://github.com/apache/incubator-aurora), manages much of Twitter's infrastructure and as such needs to have as small a memory footprint as possible. The process of pulling out one .so file (a 40MB library in one of our core dependencies) took ~150MB of RAM due to zipimport making two copies of it in RAM (in order to do a string comparison) before writing it to disk as an eager resource. This became problematic because the executor itself only needs ~45MB RSS in steady state, which is the overhead we'd prefer to allocate to its kernel container, but the reality is that we have to allocate for peak. While 100MB extra RSS doesn't sound like much, when you multiply by O(10s) tasks per machine and O(1000s) machines, it translates to O(TBs) of RAM, which is O( :-( $ ) to Twitter. Lastly, there are social reasons. It's just hard to convince most engineers to use things like pkg_resources or pkgutil to manipulate resources when for them the status quo is just using __file__. Bizarrely the social challenges are just as hard as the abovementioned technical challenges.
Also, is your production usage confined to POSIX platforms, or does Windows have any role to play? If so, did you run into any particular issues relating to Windows that might not be common knowledge?
We only target Linux and OS X in practice. I've gotten PEX working with CPython (>=2.6 and >=3.2), PyPy and Jython in those situations. I dabbled with IronPython briefly but never got it to work. I don't see any really good reason why PEX files couldn't be launched on Windows but I've never tried. ~brian
On Fri, 31/1/14, Brian Wickman <wickman@gmail.com> wrote:
There are myriad other practical reasons. Here are some:
Thanks for taking the time to respond with the details - they are good data points to think about!
Lastly, there are social reasons. It's just hard to convince most engineers to use things like pkg_resources or pkgutil to manipulate resources when for them the status quo is just using __file__. Bizarrely the social challenges are just as hard as the abovementioned technical challenges.
I agree it's bizarre, but sadly it's not surprising. People get used to certain ways of doing things, and a certain kind of collective myopia develops when it comes to looking at different ways of doing things. Having worked with fairly diverse systems in my time, ISTM that sections of the Python community have this myopia too. For example, the Java hatred and PEP 8 zealotry that you see here and there. One of the things that's puzzled me, for example, is why people think it's reasonable or even necessary to have copies of pip and setuptools in every virtual environment - often the same people who will tell you that your code isn't DRY enough! It's certainly not a technical requirement, yet one of the reasons why PEP 405 venvs aren't that popular is that pip and setuptools aren't automatically put in there. It's a social issue - it's been decided that rather than exploring a technical approach to addressing any issue with installing into venvs, it's better to bundle pip and setuptools with Python 3.4, since that will seemingly be easier for people to swallow :-) Regards, Vinay Sajip
On 1 February 2014 18:23, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Fri, 31/1/14, Brian Wickman <wickman@gmail.com> wrote:
There are myriad other practical reasons. Here are some:
Thanks for taking the time to respond with the details - they are good data points to think about!
Lastly, there are social reasons. It's just hard to convince most engineers to use things like pkg_resources or pkgutil to manipulate resources when for them the status quo is just using __file__. Bizarrely the social challenges are just as hard as the abovementioned technical challenges.
I agree it's bizarre, but sadly it's not surprising. People get used to certain ways of doing things, and a certain kind of collective myopia develops when it comes to looking at different ways of doing things. Having worked with fairly diverse systems in my time, ISTM that sections of the Python community have this myopia too. For example, the Java hatred and PEP 8 zealotry that you see here and there.
One of the things that's puzzled me, for example, is why people think it's reasonable or even necessary to have copies of pip and setuptools in every virtual environment - often the same people who will tell you that your code isn't DRY enough! It's certainly not a technical requirement, yet one of the reasons why PEP 405 venvs aren't that popular is that pip and setuptools aren't automatically put in there. It's a social issue - it's been decided that rather than exploring a technical approach to addressing any issue with installing into venvs, it's better to bundle pip and setuptools with Python 3.4, since that will seemingly be easier for people to swallow :-)
FWIW, installing into a venv from outside it works fine (that's how ensurepip works in 3.4). However, it's substantially *harder* to explain to people how to use it correctly that way. In theory you could change activation so that it also affected the default install locations, but the advantage of just having them installed per venv is that you're relying more on the builtin Python path machinery rather than adding something new. So while it's wasteful of disk space and means needing to upgrade them in every virtualenv, it does actually categorically eliminate many potential sources of bugs. Doing things the way pip and virtualenv do them also meant there was a whole pile of design work that *didn't need to be done* to get a functional system up and running. Avoiding work by leveraging existing capabilities is a time honoured engineering tradition, even when the simple way isn't the most elegant way. Consider also the fact that we had full virtual machines long before we have usable Linux containers: full isolation is actually *easier* than partial isolation, because there are fewer places for things to go wrong, and less integration work to do in the first place. That said, something I mentioned to the OpenStack folks a while ago (and I think on this list, but potentially not), is that I have now realised the much-reviled (for good reason) *.pth files actually have a legitimate use case in allowing API compatible versions of packages to be shared between multiple virtual environments - you can trade reduced isolation for easier upgrades on systems containing multiple virtual environments by adding a suitable *.pth file to the venv rather than the package itself. While there's currently no convenient tooling around that, they're a feature CPython has supported for as long as I can remember, so tools built on that idea would comfortably work on all commonly supported Python versions. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Feb 1, 2014, at 12:43 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 1 February 2014 18:23, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Fri, 31/1/14, Brian Wickman <wickman@gmail.com> wrote:
There are myriad other practical reasons. Here are some:
Thanks for taking the time to respond with the details - they are good data points to think about!
Lastly, there are social reasons. It's just hard to convince most engineers to use things like pkg_resources or pkgutil to manipulate resources when for them the status quo is just using __file__. Bizarrely the social challenges are just as hard as the abovementioned technical challenges.
I agree it's bizarre, but sadly it's not surprising. People get used to certain ways of doing things, and a certain kind of collective myopia develops when it comes to looking at different ways of doing things. Having worked with fairly diverse systems in my time, ISTM that sections of the Python community have this myopia too. For example, the Java hatred and PEP 8 zealotry that you see here and there.
One of the things that's puzzled me, for example, is why people think it's reasonable or even necessary to have copies of pip and setuptools in every virtual environment - often the same people who will tell you that your code isn't DRY enough! It's certainly not a technical requirement, yet one of the reasons why PEP 405 venvs aren't that popular is that pip and setuptools aren't automatically put in there. It's a social issue - it's been decided that rather than exploring a technical approach to addressing any issue with installing into venvs, it's better to bundle pip and setuptools with Python 3.4, since that will seemingly be easier for people to swallow :-)
FWIW, installing into a venv from outside it works fine (that's how ensurepip works in 3.4). However, it's substantially *harder* to explain to people how to use it correctly that way. In theory you could change activation so that it also affected the default install locations, but the advantage of just having them installed per venv is that you're relying more on the builtin Python path machinery rather than adding something new. So while it's wasteful of disk space and means needing to upgrade them in every virtualenv, it does actually categorically eliminate many potential sources of bugs. Doing things the way pip and virtualenv do them also meant there was a whole pile of design work that *didn't need to be done* to get a functional system up and running. Avoiding work by leveraging existing capabilities is a time honoured engineering tradition, even when the simple way isn't the most elegant way. Consider also the fact that we had full virtual machines long before we have usable Linux containers: full isolation is actually *easier* than partial isolation, because there are fewer places for things to go wrong, and less integration work to do in the first place.
That said, something I mentioned to the OpenStack folks a while ago (and I think on this list, but potentially not), is that I have now realised the much-reviled (for good reason) *.pth files actually have a legitimate use case in allowing API compatible versions of packages to be shared between multiple virtual environments - you can trade reduced isolation for easier upgrades on systems containing multiple virtual environments by adding a suitable *.pth file to the venv rather than the package itself. While there's currently no convenient tooling around that, they're a feature CPython has supported for as long as I can remember, so tools built on that idea would comfortably work on all commonly supported Python versions.
In all but a tiny number of cases, you could use a symlink for this. Much less magic :-) --Noah
On Sat, 1/2/14, Noah Kantrowitz <noah@coderanger.net> wrote:
In all but a tiny number of cases, you could use a symlink for this. Much less magic :-)
That's "POSIX is all there is" myopia, right there. While recent versions of Windows have symlinks more like POSIX symlinks, XP only has a stunted version called "reparse points" or "junction points" which are not really fit for purpose. I think you'll find that XP environments are found in rather more than "a tiny number of cases", and even though Microsoft has end-of-lifed XP in terms of support, I fear it'll be around for a while yet. Regards, Vinay Sajip
On 1 February 2014 09:36, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
While recent versions of Windows have symlinks more like POSIX symlinks, XP only has a stunted version called "reparse points" or "junction points" which are not really fit for purpose. I think you'll find that XP environments are found in rather more than "a tiny number of cases", and even though Microsoft has end-of-lifed XP in terms of support, I fear it'll be around for a while yet.
One crippling thing about even recent Windows symlink support is that to create a symlink requires elevation. There is a user right that can be assigned to allow creation of symling without elevation, but bizarrely (and, I gather, deliberately!) Windows strips that right from administrative accounts - so you have a situation where a normal user with the right can create symlinks, but an administrator with the same right still needs elevation. And a major proportion of Windows accounts are administrators. Sigh. One day they might get it right. Paul
On Feb 1, 2014, at 1:36 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Noah Kantrowitz <noah@coderanger.net> wrote:
In all but a tiny number of cases, you could use a symlink for this. Much less magic :-)
That's "POSIX is all there is" myopia, right there. While recent versions of Windows have symlinks more like POSIX symlinks, XP only has a stunted version called "reparse points" or "junction points" which are not really fit for purpose. I think you'll find that XP environments are found in rather more than "a tiny number of cases", and even though Microsoft has end-of-lifed XP in terms of support, I fear it'll be around for a while yet.
Junctions on Windows are actually more flexible than POSIX symlinks, and I have in fact used them for Python packages before when doing Django dev on Windows XP. --Noah
On Sat, 1/2/14, Noah Kantrowitz <noah@coderanger.net> wrote:
Junctions on Windows are actually more flexible than POSIX symlinks, and I have in fact used them for Python packages before when doing Django dev on Windows XP.
Why do you say they're more flexible? They work as directories/mount points and AFAIK you can't use them for files. Plus, what Paul said about the elevation issues (not a factor for XP, I know, but would need to be taken into account for any solution across multiple Windows versions). Regards, Vinay Sajip
On 1 February 2014 18:50, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 1, 2014, at 12:43 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That said, something I mentioned to the OpenStack folks a while ago (and I think on this list, but potentially not), is that I have now realised the much-reviled (for good reason) *.pth files actually have a legitimate use case in allowing API compatible versions of packages to be shared between multiple virtual environments - you can trade reduced isolation for easier upgrades on systems containing multiple virtual environments by adding a suitable *.pth file to the venv rather than the package itself. While there's currently no convenient tooling around that, they're a feature CPython has supported for as long as I can remember, so tools built on that idea would comfortably work on all commonly supported Python versions.
In all but a tiny number of cases, you could use a symlink for this. Much less magic :-)
Unfortunately, that has two problems: - the PEP 376 installation database directories are versioned so you would need at least *two* symlinks (and potentially more for packages that include multiple top level entries) for the installation metadata to be properly available, and the metadata one would still have to be updated in every virtual environment whenever the shared version changed - Windows symlinks are still broken and hard to use compared to POSIX symlinks If it wasn't for the "support execution of arbitrary Python code during interpreter startup" "feature" *.pth files likely wouldn't worry so many people, but we're stuck with that now for backwards compatibility reasons :P (Well, and because nobody thought to suggest reducing its power to just adding new sys.path entries in Python 3...) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
FWIW, installing into a venv from outside it works fine
That's what I'm saying :-)
However, it's substantially *harder* to explain to people how to use it correctly that way.
But distil installs into venvs (both PEP 405 and virtualenv) without needing to be in there. It's as simple as distil -e path/to/venv install XYZ So I'm not sure what you mean by "use correctly" or what exactly is hard to explain. Distil doesn't do anything special - after all, installing a distribution just means moving certain files into certain locations. If you know that you're installing into one of (a) a specific venv, or (b) user site-packages, or (c) system site-packages, then you know what the installation locations are without doing anything especially clever, don't you? Of course, building and installation are commingled in distutils and setuptools - that's "build-from-source-is- all-you need" myopia right there - but I don't see how that prevents installation-from-without or forces the usage of installation-from-within, at least at a fundamental level.
In theory you could change activation so that it also affected the default install locations, but the advantage of just having them installed per venv is that you're relying more on the builtin Python path machinery rather than adding something new.
But virtualenv solves an analogous problem when creating venvs with different interpreters by reinvoking itself with a specific interpreter. Distil does something similar, but that's more to do with the fact that code in setup.py sometimes takes different paths depending on which version of Python is running it.
So while it's wasteful of disk space and
Disk space is cheap, so that's not the issue, it's the lack of elegance.
Avoiding work by leveraging existing capabilities is a time honoured engineering tradition, even when the simple way isn't the most elegant way.
I'm pretty pragmatic, so I agree with this point in general, but in this case I think you're side-stepping the issue of technical debt. We're talking about file-system manipulations, after all, not anything more esoteric than that. Any approach which perpetuates the "python setup.py install" way of doing things is, to me, not a good thing, which I think has been recognised from the good old days when Tarek started taking a fresh look at packaging.
I have now realised the much-reviled (for good reason) *.pth files actually have a legitimate use case in allowing API compatible versions of packages to be shared between multiple virtual environments
Do you mean ref files, as discussed on import-sig? Sure, but that's a tangential issue in my view. Worthwhile pursuing, certainly, but an incremental improvement over what we have now (and actually less necessary in the parallel universe where wheels are more like jars and not reviled for that just because of "Java - ewww"). Regards, Vinay Sajip
On 1 February 2014 19:29, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
FWIW, installing into a venv from outside it works fine
That's what I'm saying :-)
However, it's substantially *harder* to explain to people how to use it correctly that way.
But distil installs into venvs (both PEP 405 and virtualenv) without needing to be in there. It's as simple as
distil -e path/to/venv install XYZ
So I'm not sure what you mean by "use correctly" or what exactly is hard to explain. Distil doesn't do anything special - after all, installing a distribution just means moving certain files into certain locations. If you know that you're installing into one of (a) a specific venv, or (b) user site-packages, or (c) system site-packages, then you know what the installation locations are without doing anything especially clever, don't you?
I'm talking about easing the cognitive burden on developers. The pip/virtualenv model is that once an environment is activated, developers don't need to think about it anymore - they're in that environment until they explicitly switch it off. By installing pip *into* the environment, then it is automatically affected by the same settings as all other Python components, so "pip install library" inside an activated virtual environment *just does the right thing*, and it does it in a way that doesn't require any additional custom machinery or testing. Getting a substantial usability improvement *almost for free* (since requirements.txt makes it easy to discard and rebuild virtual environments as needed, reducing the maintainability burden) is a big win. Making any installer automatically respect an activated virtual environment without installing it into that environment is no doubt *feasible*, but just installing the installer *into* the environment has the same effect with requiring any additional engineering or testing effort. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm talking about easing the cognitive burden on developers. The pip/virtualenv model is that once an environment is activated, developers don't need to think about it anymore - they're in that environment until they explicitly switch it off.
But distil deals with this:: $ pyvenv-3.3 /tmp/venv $ source /tmp/venv/bin/activate (venv) $ distil --ve [...] Python: 3.3 [...] So, distil is using the venv's Python. Let's deactivate and try again: (venv) $ deactivate $ distil --ve [...] Python: 2.7 [...] So we're now using the system python - so far, so good.
By installing pip *into* the environment, then it is automatically affected by the same settings as all other Python components, so "pip install library" inside an activated virtual environment *just does the right thing*, and it does it in a way that doesn't require any additional custom machinery or testing.
Likewise, with distil: $ source /tmp/venv/bin/activate (venv) $ distil list (venv) $ distil install sarge Checking requirements for sarge (0.1.3) ... done. The following new packages will be downloaded and installed: sarge (0.1.3) Downloading sarge-0.1.3.tar.gz to /tmp/tmpn01ls2 [...] Building sarge (0.1.3) ... [...] Installing sarge (0.1.3) ... [...] Installation completed. (venv) $ distil list sarge 0.1.3 (venv) $ So, I am at a loss to see your point.
Making any installer automatically respect an activated virtual environment without installing it into that environment is no doubt *feasible*, but just installing the installer *into* the environment has the same effect with requiring any additional engineering or testing effort.
Not only feasible, there needing to be no doubt about it ;-), but also not rocket science, and demonstrated above. Distil isn't doing anything especially clever, so I don't understand what you mean by "any additional engineering or testing effort". ISTM you're making things seem more scary than they are. Of course, if you find some specific *installation* scenario that distil doesn't cover, I'll be all ears. I emphasised *installation* to distinguish from building - distil eschews running setup.py during installation, since if it did that then it would be no different from pip in that respect, so there'd be no point to it. So, "distil install" can't deal with cases of substantive logic in setup.py - but that's not to say that distil can't deal with setup.py at all. Instead, that functionality is provided in other distil commands which deal with building wheels from sdists. They could be relatively easily merged into an install command that works like pip, but I thought the point was to separate building from installation, so that's why distil is like it is at the moment. Regards, Vinay Sajip
On 1 February 2014 20:49, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
So, I am at a loss to see your point.
My point is that doing it the way virtualenv/pip did avoided a bunch of design work and associated testing, and reduced the opportunities for bugs - when you're trying to get things done with limited resources, that's a sensible engineering trade-off to make. As Paul noted in his reply, there's nothing that *inherently* limits pip to working that way, it just hasn't been a priority to change. That said (and this is a point that hadn't occurred to me earlier), it's also worth noting that not only does the bootstrapping approach work well enough in most cases, but it also solves the problem of being able to easily have a newer (or older!) pip in virtual environments than is provided by the distro package manager on Linux systems, *without* needing to do a global install outside the package manager's control. On that last point, note these two recommendations that PEP 453 makes to downstream redistributors: * ``pip install --upgrade pip`` in a global installation should not affect any already created virtual environments (but is permitted to affect future virtual environments, even though it will not do so when using the standard implementation of ensurepip). * ``pip install --upgrade pip`` in a virtual environment should not affect the global installation. If pip maintained perfect backwards compatibility across every release, those recommendations wouldn't be necessary. However, we're still in the process of actively hardening the security model, and that *does* mean we're going to break things at times. Having the version of pip in use be virtual environment dependent rather than Python installation dependent helps deal with that more gracefully. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
My point is that doing it the way virtualenv/pip did avoided a bunch of design work and associated testing, and reduced the opportunities for bugs - when you're trying to get things done with limited resources, that's a sensible engineering trade-off to make.
A "bunch of design work" makes it seem a lot more complicated than it really is. Your suggestion comes across like an ex post facto rationalisation of a decision which was perhaps more truly based on social concerns than technical ones. Note that I've developed distil as part of my volunteer activities - it doesn't pay any of my bills - and on my own. And you're telling me about how to get the best out of "limited resources"? :-)
That said (and this is a point that hadn't occurred to me earlier), it's also worth noting that not only does the bootstrapping approach work well enough in most cases, but it also solves the problem of being able to easily have a newer (or older!) pip in virtual environments than is provided by the distro package manager on Linux systems
Eh? The problem of having a newer or older pip in a venv only exists because pip needs to be in a venv ... so I don't see the relevance of this to our earlier discussion. Since distil doesn't occupy a space in venvs,. the concern of a system version being older or newer than that in a venv doesn't arise. Regards, Vinay Sajip
On 1 February 2014 22:20, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
My point is that doing it the way virtualenv/pip did avoided a bunch of design work and associated testing, and reduced the opportunities for bugs - when you're trying to get things done with limited resources, that's a sensible engineering trade-off to make.
A "bunch of design work" makes it seem a lot more complicated than it really is. Your suggestion comes across like an ex post facto rationalisation of a decision which was perhaps more truly based on social concerns than technical ones.
I have no idea how most of the internal design decisions on pip were made (and I neither need nor particularly want to know - that would be rather missing the point of collaborative development). However, you're trying to make out that the distil approach is so clearly superior that you don't understand why anyone would ever have implemented it differently, and I'm trying to point out that there's an entirely plausible answer to that question that doesn't involve assuming poor engineering trade-offs on the part of the pip development team.
Note that I've developed distil as part of my volunteer activities - it doesn't pay any of my bills - and on my own. And you're telling me about how to get the best out of "limited resources"? :-)
No, things we do in our free time are the *best* opportunities to investigate things like that, because they're not necessarily goal oriented - they're more about satisfying our curiousity. Goal oriented development is a *terrible* environment for exploration, because we have to ruthlessly prune activities that don't obviously move us closer to our goal. Consider the following two engineering design options: A. We install an independent copy of the installer in every virtual environment, guaranteeing full isolation from the system Python and permitting the use of different installer versions for different projects, all using the exact same isolation mechanism as is used for any Python component B. We design a separate mechanism that allows a system installation of the installer to easily be used to install software into virtual environments, but requiring that the system and all virtual environments be using the same version of the installer, and needing to come up with a distinct targeting mechanism separate from the activation mechanism already provided by virtual environments Now *a priori*, you don't know how complex B is going to be. On the other hand, you already *know* A will work, because you already have working virtual environments. So, from a scope of isolation point of view, A has the advantage: the installer itself is isolated in addition to the installed components. And from a design uncertainty point of view, A also has the advantage: there is *no* design uncertainty, because you're just using the already-known-to-work virtual environment mechanism. So in a goal-oriented "provide isolation of Python applications" context, there's no *reason* to even explore B: it has more design uncertainty, and offers a lower level of isolation (because the installer is still shared). Now, you *have* had time to explore option B in distil, and have discovered that it actually wasn't that hard after all. Cool, that's great and would be a nice feature to have in pip as well. However, the fact that *having already done it*, you discovered it wasn't all that difficult, doesn't change the fact that the level of effort involved was still greater than the nonexistent amount of work needed to get the existing solution working once virtual environments are available (and I noticed that you cut the part of the email pointing out the concrete technical benefits of being able to isolate the installer in addition to the installed components). It's one thing to disagree with a design decision after we have made the absolutely best possible case that we can for why that decision was the right choice, and still failing to convince ourselves. It's something else entirely to dismiss someone else's use case as invalid and *then* argue that an arguably simpler design (that doesn't handle the dismissed use case and was still only hypothetical at the time the decision was made) would have been a better choice. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
I have no idea how most of the internal design decisions on pip were made (and I neither need nor particularly want to know - that would be rather missing the point of collaborative development). However, you're trying to make out that the distil approach is so clearly superior that you don't
Not "superior" in terms of "clever". I've made a point of saying that distil *doesn't do anything clever* in this area. If you're saying "a bunch of design work", making "a bunch" sound like "a lot", without any detailed knowledge of the design internals, then how do you know if it was a hard engineering issue or a trivial one? If I say "I didn't find it hard, and I'm not claiming to be especially clever" then at least I'm speaking from a knowledge of what it takes to achieve something, rather than perhaps taking someone else's word for it. Decisions should be properly informed, in my view. Oh, and collaborative development sometimes involves challenging prior work and assumptions. That's one way in which things get better - we build on the past, but don't remain enslaved to it.
understand why anyone would ever have implemented it differently, and I'm trying to point out that there's an entirely plausible answer to that question that doesn't involve assuming poor engineering trade-offs on the part of the pip development team.
I didn't (at least, not intentionally) say anything about poor engineering trade-offs made by anyone - what would be the point of that? I do say that better engineering approaches might exist, and suggest that the way distil does it one such approach. If you can suggest technical flaws in the approach distil takes, that's one thing, and I will be keen to look at any points you make. If you have no time to examine distil and make pronouncements, that's fine too - we're all busy people. But defending the status quo seemingly on nebulous technical grounds like "a bunch", while avowing no knowledge of the technical details, seems wrong. How many milli-hodges are there in a bunch? ;-)
No, things we do in our free time are the *best* opportunities to investigate things like that, because they're not necessarily goal oriented - they're more about satisfying our curiousity.
I don't disagree - my clumsily made point was to to say "volunteer" in the sense of "part-time", and therefore to strengthen the fact that I well understand "limited resources".
Consider the following two engineering design options: [details snipped] Now *a priori*, you don't know how complex B is going to be. On the other hand, you already *know* A will work, because you already have working virtual environments.
That is fine as an *a priori* position, and it's the position I was in when contemplating the development of distil. But we don't stay stuck in *a priori* forever: as I developed distil, it became clear that alternative approaches can work. Of course there were uncertainties at the beginning, but by doing the development work I was able to understand better the nature of those, and I could convert them into certainties of what would and wouldn't work.
And from a design uncertainty point of view, A also has the advantage: there is *no* design uncertainty, because you're just using the already-known-to-work virtual environment mechanism.
So you're saying PEP 405 was a mistake? We should just have stuck with the tried-and-tested virtualenv, which required changing every time a new Python version came out, because it used (of necessity) what might be uncharitable to call "hacks"?
Now, you *have* had time to explore option B in distil, and have discovered that it actually wasn't that hard after all. Cool, that's great and would be a nice feature to have in pip as well. However, the fact that *having already done it*, you discovered it wasn't all that difficult, doesn't change the fact that the level of effort involved was still greater than the nonexistent amount of work needed to get the existing solution working
Yes, but it's not as if I'm just presenting this aspect of distil *now*. I certainly highlighted the salient features on this list *before* the PEP 453 ship set sail and distil is pretty well documented on readthedocs.org.
and I noticed that you cut the part of the email pointing out the concrete technical benefits of being able to isolate the installer in addition to the installed components.
I wasn't trying to gloss over anything. If an installer maintains backward compatibility, then isolating the installer is not especially valuable. Even if the installer doesn't get it right, using different versions of the installer is perfectly possible (and easy with distil, as it's a single-file executable), and in my experience it's not been unheard of to have had to "downgrade" virtualenv or pip because a new release broke something.
It's one thing to disagree with a design decision after we have made the absolutely best possible case that we can for why that decision was the right choice, and still failing to convince ourselves. It's something else entirely to dismiss someone else's use case as invalid and *then* argue that an arguably simpler design (that doesn't handle the dismissed use case and was still only hypothetical at the time the decision was made)
Is this some revisionist view of history? I'm confused :-) The original announcement I made about distil was on 26 March, 2013: http://thread.gmane.org/gmane.comp.python.distutils.devel/17336 The relevant features were fully documented in the version 0.1.0 documentation for distil, which was tagged on 22 March, 2013: https://bitbucket.org/vinay.sajip/docs-distil/commits/04ac4badcab941e12ff6ec... This was on readthedocs.org and pythonhosted.org at the time of the announcement. The date on PEP 453 is 10 August, 2013 - at least four months later: http://www.python.org/dev/peps/pep-0453/ So, if no one who participated in the PEP 453 decision had time to look at distil in that four month period, fair enough. It doesn't mean that I'm somehow trying to bring something *new* to the discussion *now*. I certainly made my reservations about the pip- bundling route at the time it was being discussed here, but if nobody was willing to look at alternatives, it's not something I could help. Of course, once the decision is made, what can you do but defend it? I understand your position, and I'm not trying to change anything. If anyone wants to provide specific, technical feedback about any problems with distil's approach in any area, I'll welcome that feedback - but I've already expressed my view on non-technical battles. Regards, Vinay Sajip
It predates my involvement with pip, but pip used to have the feature where it did not require being installed into the virtualenv. It had the -E path/to/venv flag, and also had a PIP_RESPECT_VIRTUALENV env var that would have it automatically install into the activated virtualenv. This was removed in the 1.1 release which was released on 2012-02-16. Again this predates my involvement with pip, however the changelog states that:: * Removed ``-E``/``--environment`` option and ``PIP_RESPECT_VIRTUALENV``; both use a restart-in-venv mechanism that's broken, and neither one is useful since every virtualenv now has pip inside it. Replace ``pip -E path/to/venv install Foo`` with ``virtualenv path/to/venv && path/to/venv/pip install Foo``. It was removed in this commit: https://github.com/pypa/pip/commit/87dd157. So perhaps it isn't that pip is lacking a useful feature, it's that pip tried it, as you did with distil, and due to pip's much larger user base it got a much more thorough real world experience with it and it turned out to be ultimately more trouble than it was worth? It could also have been a problem with how pip wrote it of course, and perhaps you've discovered a better way of writing it. However at the time it's obvious that the pain it was causing was enough to cause the pip developers to break backwards compatibility to remove that feature. Why exactly was it removed? Well I can't find the reasoning beyond that commit so we'd need to get Carl or perhaps Jannis to come and tell us if they still remember. However it's clearly not that pip didn't just try or think of this, it did and it ultimately removed it. On Feb 1, 2014, at 9:04 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
I have no idea how most of the internal design decisions on pip were made (and I neither need nor particularly want to know - that would be rather missing the point of collaborative development). However, you're trying to make out that the distil approach is so clearly superior that you don't
Not "superior" in terms of "clever". I've made a point of saying that distil *doesn't do anything clever* in this area. If you're saying "a bunch of design work", making "a bunch" sound like "a lot", without any detailed knowledge of the design internals, then how do you know if it was a hard engineering issue or a trivial one? If I say "I didn't find it hard, and I'm not claiming to be especially clever" then at least I'm speaking from a knowledge of what it takes to achieve something, rather than perhaps taking someone else's word for it. Decisions should be properly informed, in my view.
Oh, and collaborative development sometimes involves challenging prior work and assumptions. That's one way in which things get better - we build on the past, but don't remain enslaved to it.
understand why anyone would ever have implemented it differently, and I'm trying to point out that there's an entirely plausible answer to that question that doesn't involve assuming poor engineering trade-offs on the part of the pip development team.
I didn't (at least, not intentionally) say anything about poor engineering trade-offs made by anyone - what would be the point of that? I do say that better engineering approaches might exist, and suggest that the way distil does it one such approach. If you can suggest technical flaws in the approach distil takes, that's one thing, and I will be keen to look at any points you make. If you have no time to examine distil and make pronouncements, that's fine too - we're all busy people. But defending the status quo seemingly on nebulous technical grounds like "a bunch", while avowing no knowledge of the technical details, seems wrong. How many milli-hodges are there in a bunch? ;-)
No, things we do in our free time are the *best* opportunities to investigate things like that, because they're not necessarily goal oriented - they're more about satisfying our curiousity.
I don't disagree - my clumsily made point was to to say "volunteer" in the sense of "part-time", and therefore to strengthen the fact that I well understand "limited resources".
Consider the following two engineering design options: [details snipped] Now *a priori*, you don't know how complex B is going to be. On the other hand, you already *know* A will work, because you already have working virtual environments.
That is fine as an *a priori* position, and it's the position I was in when contemplating the development of distil. But we don't stay stuck in *a priori* forever: as I developed distil, it became clear that alternative approaches can work. Of course there were uncertainties at the beginning, but by doing the development work I was able to understand better the nature of those, and I could convert them into certainties of what would and wouldn't work.
And from a design uncertainty point of view, A also has the advantage: there is *no* design uncertainty, because you're just using the already-known-to-work virtual environment mechanism.
So you're saying PEP 405 was a mistake? We should just have stuck with the tried-and-tested virtualenv, which required changing every time a new Python version came out, because it used (of necessity) what might be uncharitable to call "hacks"?
Now, you *have* had time to explore option B in distil, and have discovered that it actually wasn't that hard after all. Cool, that's great and would be a nice feature to have in pip as well. However, the fact that *having already done it*, you discovered it wasn't all that difficult, doesn't change the fact that the level of effort involved was still greater than the nonexistent amount of work needed to get the existing solution working
Yes, but it's not as if I'm just presenting this aspect of distil *now*. I certainly highlighted the salient features on this list *before* the PEP 453 ship set sail and distil is pretty well documented on readthedocs.org.
and I noticed that you cut the part of the email pointing out the concrete technical benefits of being able to isolate the installer in addition to the installed components.
I wasn't trying to gloss over anything. If an installer maintains backward compatibility, then isolating the installer is not especially valuable. Even if the installer doesn't get it right, using different versions of the installer is perfectly possible (and easy with distil, as it's a single-file executable), and in my experience it's not been unheard of to have had to "downgrade" virtualenv or pip because a new release broke something.
It's one thing to disagree with a design decision after we have made the absolutely best possible case that we can for why that decision was the right choice, and still failing to convince ourselves. It's something else entirely to dismiss someone else's use case as invalid and *then* argue that an arguably simpler design (that doesn't handle the dismissed use case and was still only hypothetical at the time the decision was made)
Is this some revisionist view of history? I'm confused :-) The original announcement I made about distil was on 26 March, 2013:
http://thread.gmane.org/gmane.comp.python.distutils.devel/17336
The relevant features were fully documented in the version 0.1.0 documentation for distil, which was tagged on 22 March, 2013:
https://bitbucket.org/vinay.sajip/docs-distil/commits/04ac4badcab941e12ff6ec...
This was on readthedocs.org and pythonhosted.org at the time of the announcement.
The date on PEP 453 is 10 August, 2013 - at least four months later:
http://www.python.org/dev/peps/pep-0453/
So, if no one who participated in the PEP 453 decision had time to look at distil in that four month period, fair enough. It doesn't mean that I'm somehow trying to bring something *new* to the discussion *now*. I certainly made my reservations about the pip- bundling route at the time it was being discussed here, but if nobody was willing to look at alternatives, it's not something I could help.
Of course, once the decision is made, what can you do but defend it? I understand your position, and I'm not trying to change anything. If anyone wants to provide specific, technical feedback about any problems with distil's approach in any area, I'll welcome that feedback - but I've already expressed my view on non-technical battles.
Regards,
Vinay Sajip _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Feb 1, 2014, at 10:45 AM, Donald Stufft <donald@stufft.io> wrote:
It predates my involvement with pip, but pip used to have the feature where it did not require being installed into the virtualenv. It had the -E path/to/venv flag, and also had a PIP_RESPECT_VIRTUALENV env var that would have it automatically install into the activated virtualenv.
This was removed in the 1.1 release which was released on 2012-02-16. Again this predates my involvement with pip, however the changelog states that::
* Removed ``-E``/``--environment`` option and ``PIP_RESPECT_VIRTUALENV``; both use a restart-in-venv mechanism that's broken, and neither one is useful since every virtualenv now has pip inside it. Replace ``pip -E path/to/venv install Foo`` with ``virtualenv path/to/venv && path/to/venv/pip install Foo``.
It was removed in this commit: https://github.com/pypa/pip/commit/87dd157.
So perhaps it isn't that pip is lacking a useful feature, it's that pip tried it, as you did with distil, and due to pip's much larger user base it got a much more thorough real world experience with it and it turned out to be ultimately more trouble than it was worth? It could also have been a problem with how pip wrote it of course, and perhaps you've discovered a better way of writing it. However at the time it's obvious that the pain it was causing was enough to cause the pip developers to break backwards compatibility to remove that feature.
Why exactly was it removed? Well I can't find the reasoning beyond that commit so we'd need to get Carl or perhaps Jannis to come and tell us if they still remember. However it's clearly not that pip didn't just try or think of this, it did and it ultimately removed it.
On Feb 1, 2014, at 9:04 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
I have no idea how most of the internal design decisions on pip were made (and I neither need nor particularly want to know - that would be rather missing the point of collaborative development). However, you're trying to make out that the distil approach is so clearly superior that you don't
Not "superior" in terms of "clever". I've made a point of saying that distil *doesn't do anything clever* in this area. If you're saying "a bunch of design work", making "a bunch" sound like "a lot", without any detailed knowledge of the design internals, then how do you know if it was a hard engineering issue or a trivial one? If I say "I didn't find it hard, and I'm not claiming to be especially clever" then at least I'm speaking from a knowledge of what it takes to achieve something, rather than perhaps taking someone else's word for it. Decisions should be properly informed, in my view.
Oh, and collaborative development sometimes involves challenging prior work and assumptions. That's one way in which things get better - we build on the past, but don't remain enslaved to it.
understand why anyone would ever have implemented it differently, and I'm trying to point out that there's an entirely plausible answer to that question that doesn't involve assuming poor engineering trade-offs on the part of the pip development team.
I didn't (at least, not intentionally) say anything about poor engineering trade-offs made by anyone - what would be the point of that? I do say that better engineering approaches might exist, and suggest that the way distil does it one such approach. If you can suggest technical flaws in the approach distil takes, that's one thing, and I will be keen to look at any points you make. If you have no time to examine distil and make pronouncements, that's fine too - we're all busy people. But defending the status quo seemingly on nebulous technical grounds like "a bunch", while avowing no knowledge of the technical details, seems wrong. How many milli-hodges are there in a bunch? ;-)
No, things we do in our free time are the *best* opportunities to investigate things like that, because they're not necessarily goal oriented - they're more about satisfying our curiousity.
I don't disagree - my clumsily made point was to to say "volunteer" in the sense of "part-time", and therefore to strengthen the fact that I well understand "limited resources".
Consider the following two engineering design options: [details snipped] Now *a priori*, you don't know how complex B is going to be. On the other hand, you already *know* A will work, because you already have working virtual environments.
That is fine as an *a priori* position, and it's the position I was in when contemplating the development of distil. But we don't stay stuck in *a priori* forever: as I developed distil, it became clear that alternative approaches can work. Of course there were uncertainties at the beginning, but by doing the development work I was able to understand better the nature of those, and I could convert them into certainties of what would and wouldn't work.
And from a design uncertainty point of view, A also has the advantage: there is *no* design uncertainty, because you're just using the already-known-to-work virtual environment mechanism.
So you're saying PEP 405 was a mistake? We should just have stuck with the tried-and-tested virtualenv, which required changing every time a new Python version came out, because it used (of necessity) what might be uncharitable to call "hacks"?
Now, you *have* had time to explore option B in distil, and have discovered that it actually wasn't that hard after all. Cool, that's great and would be a nice feature to have in pip as well. However, the fact that *having already done it*, you discovered it wasn't all that difficult, doesn't change the fact that the level of effort involved was still greater than the nonexistent amount of work needed to get the existing solution working
Yes, but it's not as if I'm just presenting this aspect of distil *now*. I certainly highlighted the salient features on this list *before* the PEP 453 ship set sail and distil is pretty well documented on readthedocs.org.
and I noticed that you cut the part of the email pointing out the concrete technical benefits of being able to isolate the installer in addition to the installed components.
I wasn't trying to gloss over anything. If an installer maintains backward compatibility, then isolating the installer is not especially valuable. Even if the installer doesn't get it right, using different versions of the installer is perfectly possible (and easy with distil, as it's a single-file executable), and in my experience it's not been unheard of to have had to "downgrade" virtualenv or pip because a new release broke something.
It's one thing to disagree with a design decision after we have made the absolutely best possible case that we can for why that decision was the right choice, and still failing to convince ourselves. It's something else entirely to dismiss someone else's use case as invalid and *then* argue that an arguably simpler design (that doesn't handle the dismissed use case and was still only hypothetical at the time the decision was made)
Is this some revisionist view of history? I'm confused :-) The original announcement I made about distil was on 26 March, 2013:
http://thread.gmane.org/gmane.comp.python.distutils.devel/17336
The relevant features were fully documented in the version 0.1.0 documentation for distil, which was tagged on 22 March, 2013:
https://bitbucket.org/vinay.sajip/docs-distil/commits/04ac4badcab941e12ff6ec...
This was on readthedocs.org and pythonhosted.org at the time of the announcement.
The date on PEP 453 is 10 August, 2013 - at least four months later:
http://www.python.org/dev/peps/pep-0453/
So, if no one who participated in the PEP 453 decision had time to look at distil in that four month period, fair enough. It doesn't mean that I'm somehow trying to bring something *new* to the discussion *now*. I certainly made my reservations about the pip- bundling route at the time it was being discussed here, but if nobody was willing to look at alternatives, it's not something I could help.
Of course, once the decision is made, what can you do but defend it? I understand your position, and I'm not trying to change anything. If anyone wants to provide specific, technical feedback about any problems with distil's approach in any area, I'll welcome that feedback - but I've already expressed my view on non-technical battles.
Regards,
Vinay Sajip _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Additionally I'd point out that, looking at how you've implemented it, it seems to depend on the fact that distil is a single file and thus doesn't need installed into site-packages. Pip however is not a single file and to use the same technique as you did would require installing pip into each virtualenv or doing some gross hacks to add pip to the sys.path dynamically. Now it's possible that pip *could* be modified to fully support running as a single file, however that would also entail getting our dependencies to support that too. Some of them may work currently by accident but I would be uncomfortable relying on support by accident and would want to have them committed to running like that. Finally I haven't touched distil because you've been less then forthcoming about it's own code. I realize it's just using zip tech to bundle it and thus I can unpack it and read it but: A) That is a gigantic pain in the ass to do just because for reasons unknown you've been unwilling to publish it's source. B) I actually *did* that once, and found that there is no OSS license attached to the code and instead it's "All Rights Reserved" at which point I trashed the files. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
Additionally I'd point out that, looking at how you've implemented it, it seems to depend on the fact that distil is a single file and thus doesn't need installed into site-packages.
No, I don't believe that's the case - I developed (and continue to develop) distil without running it as a single-file executable, since most of its code is in a zip when deployed as a single file, so not readily debuggable. The deployment of distil as a single file, though seen by some as controversial (because unusual), is merely meant as a convenience. Any other deployment mechanism would mean using pip to install it, which kind of defeats the purpose :-)
Finally I haven't touched distil because you've been less then forthcoming about it's own code. I realize it's just using zip tech to bundle it and thus I can unpack it and read it but:
Well, it takes less commitment to try it out as a black box and report failures or usability drawbacks than to dig into its source code, but there hasn't been a lot of feedback, so I really doubt if that's a big barrier to people trying it out. I don't believe pip users spend a lot of time looking into pip source code. As you say, the code is inspectable, though not as readily as it would be if it were on GitHub, say. I haven't given particular thought to the license since I haven't published it, but there's no conspiracy to keep anything secret or proprietary, and nothing is obfuscated. I've been looking for usability feedback and some of the features are still experimental (not so much on the installation side, more on the build side), so it's premature to open it up fully: I'm not looking for code contributions, as distil is still a test-bed for distlib and some ideas about how to improve the user experience. If that means some people lose interest, I'm OK with that: I've contributed plenty to open source and I expect to continue to do so. I just don't subscribe to the idea that somehow everything magically gets better when it's put on GitHub. Note that distlib is licensed under a valid OSS license, but it hasn't made much difference in terms of the amount of feedback I've received. Regards, Vinay Sajip
On Feb 1, 2014, at 11:50 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
Additionally I'd point out that, looking at how you've implemented it, it seems to depend on the fact that distil is a single file and thus doesn't need installed into site-packages.
No, I don't believe that's the case - I developed (and continue to develop) distil without running it as a single-file executable, since most of its code is in a zip when deployed as a single file, so not readily debuggable. The deployment of distil as a single file, though seen by some as controversial (because unusual), is merely meant as a convenience. Any other deployment mechanism would mean using pip to install it, which kind of defeats the purpose :-)
Finally I haven't touched distil because you've been less then forthcoming about it's own code. I realize it's just using zip tech to bundle it and thus I can unpack it and read it but:
Well, it takes less commitment to try it out as a black box and report failures or usability drawbacks than to dig into its source code, but there hasn't been a lot of feedback, so I really doubt if that's a big barrier to people trying it out. I don't believe pip users spend a lot of time looking into pip source code.
As you say, the code is inspectable, though not as readily as it would be if it were on GitHub, say. I haven't given particular thought to the license since I haven't published it, but there's no conspiracy to keep anything secret or proprietary, and nothing is obfuscated.
I've been looking for usability feedback and some of the features are still experimental (not so much on the installation side, more on the build side), so it's premature to open it up fully: I'm not looking for code contributions, as distil is still a test-bed for distlib and some ideas about how to improve the user experience. If that means some people lose interest, I'm OK with that: I've contributed plenty to open source and I expect to continue to do so. I just don't subscribe to the idea that somehow everything magically gets better when it's put on GitHub.
I never said anything gets magically better on GitHub, I told you the reason why *I* haven’t bothered to try it. You can do what you want with that information, ignore it or otherwise. I have a ton of things to do in my packaging time from pip to PyPI to PEPs and most people tend to have limited time. If you want people to try your thing out the onus is on you to make it attractive for them to try. As it is right now it’s not attractive for me as I can’t look to see how it’s doing something and then go and try to add that to pip or even improve distil itself.
Note that distlib is licensed under a valid OSS license, but it hasn't made much difference in terms of the amount of feedback I've received.
Distlib is a significantly different case, it’s a library that is going to be directly useful for a very small population of people. The vast majority of people do not have any need or desire to directly work with distlib. I mean think about it, it’s basically a library that is going to be useful primarily to people writing an installer, so you’ve got pip, zc.buildout, easy_install, distil, and maybe curdling as the bulk of your entire demographic. Of those pip has roughly 3 active developers at the moment, No idea on zc.buildout, easy_install has 1, distil has 1, and curdling has I have no idea. So your main demographic can probably be counted on your fingers alone. Of course you haven’t received a ton of feedback. I can guarantee you that if distlib wasn’t released under an OSS license that pip wouldn’t have used it, and then you likely wouldn’t have gotten at least half of the issues filed on the bug tracker which came from a member of the pip developers.
Regards,
Vinay Sajip
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
I never said anything gets magically better on GitHub, I told you the reason why *I* haven’t bothered to try it.
And I'm OK with that.
Distlib is a significantly different case, it’s a library that is going to be directly useful for a very small population of people. The vast majority of people do not have any need or desire to directly work with distlib.
And your point is? This is distutils-sig, not python-list: it's the only reasonable place to try and solicit feedback for something like distlib. I'm not *complaining* that I haven't had much feedback - just stating it. To state the bleedin' obvious, I don't expect to have any say in how other people spend their time, and I hope I didn't give any contrary expectation.
I can guarantee you that if distlib wasn’t released under an OSS license that pip wouldn’t have used it
Sure, but I assume it wasn't used *just* because of the license. It has to be useful in some way too. Regards, Vinay Sajip
On 1 February 2014 17:40, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
I never said anything gets magically better on GitHub, I told you the reason why *I* haven't bothered to try it.
And I'm OK with that.
FWIW, the difficulty of seeing the source of distlil is one of the reasons I've not done more with it. Not the only reason, I know, but certainly one of them for me. Just another data point for you. Paul
On Sat, Feb 1, 2014, at 12:40 PM, Vinay Sajip wrote:
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
I never said anything gets magically better on GitHub, I told you the reason why *I* haven’t bothered to try it.
And I'm OK with that.
Distlib is a significantly different case, it’s a library that is going to be directly useful for a very small population of people. The vast majority of people do not have any need or desire to directly work with distlib.
And your point is? This is distutils-sig, not python-list: it's the only reasonable place to try and solicit feedback for something like distlib. I'm not *complaining* that I haven't had much feedback - just stating it. To state the bleedin' obvious, I don't expect to have any say in how other people spend their time, and I hope I didn't give any contrary expectation.
I can guarantee you that if distlib wasn’t released under an OSS license that pip wouldn’t have used it
Sure, but I assume it wasn't used *just* because of the license. It has to be useful in some way too.
Sure, a useless tool/experiment is not made useful by a reasonable license, but a useful tool/experiment can be made useless by an unreasonable license. distil isn't licensed so that I can even legally *use* it as far as I understand the law.
Regards,
Vinay Sajip
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
distil isn't licensed so that I can even legally *use* it as far as I understand the law.
Oh really? I hadn't realised. I had thought my announcement in March inviting people to try it out would suffice, but IANAL. Having a single-file deployment means having a comprehensive license statement is tricky, but I will think on it. Not that it makes a difference to you, of course, but thanks for pointing it out. Regards, Vinay Sajip
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
distil isn't licensed so that I can even legally *use* it as far as I understand the law.
From a quick Google search, I got the following from Daniel J. Bernstein's website at http://cr.yp.to/softwarelaw.html - the page entitled "Software user's rights":
Free software ------------------- What does all this mean for the free software world? Once you've legally downloaded a program, you can compile it. You can run it. You can modify it. You can distribute your patches for other people to use. If you think you need a license from the copyright holder, you've been bamboozled by Microsoft. As long as you're not distributing the software, you have nothing to worry about. I don't see anything illegal about downloading from the BitBucket page I linked to in my original post. So, it doesn't seem all that clear cut ... I suppose it depends on the jurisdiction, too. Regards, Vinay Sajip
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
So perhaps it isn't that pip is lacking a useful feature, it's that pip tried it, as you did with distil, and due to pip's much larger user base it got a much more thorough real world experience with it and it turned out to be ultimately more trouble than it was worth? It could also have been a problem with how pip wrote it of course, and perhaps you've discovered a better way of writing it. However at the time it's obvious that the pain it was causing was enough to cause the pip developers to break backwards compatibility to remove that feature.
Why exactly was it removed? Well I can't find the reasoning beyond that commit so we'd need to get Carl or perhaps Jannis to come and tell us if they still remember. However it's clearly not that pip didn't just try or think of this, it did and it ultimately removed it.
Sure, and if they did find some problem which is also present in distil (i.e. due to a fundamental drawback of the approach) then I'll put my thinking cap back on. It'll be worth having that problem on record to avoid any repetition of work in the future. Regards, Vinay Sajip
Hi Vinay, On 02/01/2014 09:25 AM, Vinay Sajip wrote:
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
So perhaps it isn't that pip is lacking a useful feature, it's that pip tried it, as you did with distil, and due to pip's much larger user base it got a much more thorough real world experience with it and it turned out to be ultimately more trouble than it was worth? It could also have been a problem with how pip wrote it of course, and perhaps you've discovered a better way of writing it. However at the time it's obvious that the pain it was causing was enough to cause the pip developers to break backwards compatibility to remove that feature.
Why exactly was it removed? Well I can't find the reasoning beyond that commit so we'd need to get Carl or perhaps Jannis to come and tell us if they still remember. However it's clearly not that pip didn't just try or think of this, it did and it ultimately removed it.
Sure, and if they did find some problem which is also present in distil (i.e. due to a fundamental drawback of the approach) then I'll put my thinking cap back on. It'll be worth having that problem on record to avoid any repetition of work in the future.
Sadly I can't offer that specific problem, because AFAIK we never tracked it down. It was a recurring problem in IRC where people would report using -E and pip would fail to take its action within the indicated virtualenv, instead impacting their global environment. After trying and failing a number of times to isolate and reproduce that problem, I decided not to continue spending time on that, and that it would be more robust to rely on the natural isolation provided by just using pip inside the env, which required no additional code to debug and maintain. (Pip had already been installed inside every virtualenv for quite a while by then.) I regret the way I handled that particular backwards-incompatibility (not providing a deprecation path), but I think the technical choice was a good one. Identifying code that is a bug-report-magnet and choosing a viable approach where that code can be removed entirely is a sound technical choice, not a matter of "politics" or "religion" (I had no political or religious motivations in removing pip -E). As far as I can see, you haven't presented any technical issues with the current approach, just a feeling that it's inelegant (is this perhaps a religious feeling? :P). My feeling differs; I find it elegant to rely only on the virtualenv's built-in isolation, and inelegant to require extra restart-in-env code. Carl
On Tue, 4/2/14, Carl Meyer <carl@oddbird.net> wrote:
Sadly I can't offer that specific problem, because AFAIK we never tracked it down. It was a recurring problem in IRC where people would report using -E and pip would fail to take its action within the indicated virtualenv, instead impacting their global environment. After trying and failing a number of times to isolate and reproduce that problem, I decided not to continue spending time on that, and that it would be more robust to rely on the natural isolation provided by just using pip inside the env, which required no additional code to debug and maintain. (Pip had already been installed inside every virtualenv for quite a while by then.)
I regret the way I handled that particular backwards-incompatibility (not providing a deprecation path), but I think the technical choice was a good one. Identifying code that is a bug-report-magnet and choosing a viable approach where that code can be removed entirely is a sound technical choice, not a matter of "politics" or "religion" (I had no political or religious motivations in removing pip -E).
I haven't made any comment about pip -E and how / why it was removed - until Donald's post about it, I didn't even know about it. To me, the correct solution would have been to isolate the bug and remove it, but I completely understand the pragmatic approach you took. But that leaves a "dark corner" where people might be afraid to experiment in this area with pip, because it's somehow "scary". This illustrates in part why working *on* pip is painful for me - it's not exactly a large code-base, but the way it and its tests are structured makes debugging it needlessly hard. In an environment where this is all time-constrained volunteer activity, it is not at all surprising that you took the action you did.
As far as I can see, you haven't presented any technical issues with the current approach, just a feeling that it's inelegant (is this perhaps a religious feeling? :P). My feeling differs; I find it elegant to rely only on the virtualenv's built-in isolation, and inelegant to require extra restart-in-env code.
I didn't say that the current approach didn't work - it's just that IMO it's inelegant to require something to be installed into every virtual environment, just to be able to install packages into it. Seemingly we've gotten so used to it, like "python setup.py install", that it seems like part of the natural order of the universe :-) It's not especially elegant to have to restart in a venv, but IMO it's the lesser of two "evils". It is something that we currently have to do (virtualenv does it too, with -p) but if a more elegant approach were found, I'd certainly use it. The other inelegance (install pip in every venv) is not essential, as I've shown. I don't know exactly what approach pip took to implement -E - was it similar to the approach in my snippet? Certainly, if anyone were to point out any drawbacks with the logic in that snippet, I'd be interested. But if pip took a different approach to implement -E, it's possible that there was a problem with the details of that particular approach, and the problems with pip -E don't provide any evidence to invalidate the restart approach in general. I agree that tastes differ, and technical arguments can perfectly validly be about aesthetics in my view, but I don't regard these sorts of arguments as "religious". If "religion" is to be mentioned in connection with pip, then perhaps it's fair to say that pip is the "sacred cow" of Python packaging ;-) Regards, Vinay Sajip
On 4 February 2014 19:23, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
It's not especially elegant to have to restart in a venv, but IMO it's the lesser of two "evils". It is something that we currently have to do (virtualenv does it too, with -p) but if a more elegant approach were found, I'd certainly use it. The other inelegance (install pip in every venv) is not essential, as I've shown.
I've pointed out the technical problem with trying to rely solely on a global install on Linux: it makes it hard to use a newer version of pip than the OS provides. Installing pip into the virtual environments avoids that problem without conflicting with the OS package manager when it comes to the system Python installation. That decoupling from the distro version is a worthwhile technical benefit of the current approach. While it does mean that multiple virtual environments may need to be updated when a new version of pip is released, it *also* makes testing with different versions straightforward. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
I've pointed out the technical problem with trying to rely solely on a global install on Linux: it makes it hard to use a newer version of pip than the OS provides. Installing pip into the virtual environments avoids that problem without conflicting with the OS package manager when it comes to the system Python installation.
That decoupling from the distro version is a worthwhile technical benefit of the current approach. While it does mean that multiple virtual environments may need to be updated when a new version of pip is released, it *also* makes testing with different versions straightforward.
Well, it's not a bad thing IMO that distil leaves no trace of itself in a venv, so no updating of any venv is required, were a new release of distil to come out. In terms of distro version isolation, I'm still not sure I get the practical problem you're talking about, so let's get into specifics. In a parallel universe, let's assume there's a tool that can install packages into venvs, without needing to be installed into them first. (It seems a desirable feature, which prompted the development of pip -E in the first place). Let's say that this tool is shipped as part of a distro and generally kept up to date by the distro's package manager. Let's say that a new version of this tool is available upstream, but not yet integrated into the distro, and that someone wants to use it instead of the distro-supplied version. Where's the problem exactly? It could just be a single file to download and run, and you could have any number of different versions of it co-existing without any particular problem. It needn't be any worse than any other type of tool where a more-recent-than-distro version is desired. I'm probably being dense, so perhaps you could outline any specific problem scenarios along the lines of "a user wants to do X, but can't, because of Y" where X is some aspect of installation / upgrading / uninstallation of packages in a venv and Y is something which intrinsically arises from the installation tool not being installed in the venv. That would certainly help me understand where you're coming from. Regards, Vinay Sajip
Fedora 19 still ships pip 1.3.1. Fedora 20 ships pip 1.4.1. Fedora 21 will probably ship 1.5.2 (although that hasn't actually happened yet, so Rawhide still has 1.4.1). The relative alignment of the release cycles is such that the pip version system Python will always be about 6 months behind upstream. At the moment, this doesn't really matter, because people can just upgrade pip in their virtual environment(s) instead. And the way of handling that is completely consistent cross-distro, cross-platform and cross-version. It's a higher level of isolation than that offered by *not* installing pip into the virtual environments - the two approaches are *not* equivalent, and one is *not* clearly superior to the other, but rather there are pros and cons to both. Cheers, Nick.
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
It's a higher level of isolation than that offered by *not* installing pip into the virtual environments - the two approaches are *not* equivalent, and one is *not* clearly superior to the other, but rather there are pros and cons to both.
Obviously the two approaches aren't equivalent, or there wouldn't be anything to discuss. What you haven't yet done is answered my question about what the cons are of the not-in-venv approach in the practical "you can't do X, because of Y", sense. What *practical* problems are caused by a lower level of isolation exactly, which make the higher level of isolation necessary? ISTM the pip developers tried to implement -E because it was (independently of me) seen as a desirable feature, and Paul said as much, too. If it didn't work for pip, that could well be because of the specific implementation approach used, as I've said in my response to Carl. Rather than talk in abstractions like "higher levels of isolation", which have an implied sense of goodness, it would be helpful to spell out specific scenarios (in the X/Y sense I mentioned before) that illustrate that goodness versus the badness of other approaches. Regards, Vinay Sajip
On 5 February 2014 00:00, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
It's a higher level of isolation than that offered by *not* installing pip into the virtual environments - the two approaches are *not* equivalent, and one is *not* clearly superior to the other, but rather there are pros and cons to both.
Obviously the two approaches aren't equivalent, or there wouldn't be anything to discuss. What you haven't yet done is answered my question about what the cons are of the not-in-venv approach in the practical "you can't do X, because of Y", sense. What *practical* problems are caused by a lower level of isolation exactly, which make the higher level of isolation necessary?
ISTM the pip developers tried to implement -E because it was (independently of me) seen as a desirable feature, and Paul said as much, too. If it didn't work for pip, that could well be because of the specific implementation approach used, as I've said in my response to Carl.
Rather than talk in abstractions like "higher levels of isolation", which have an implied sense of goodness, it would be helpful to spell out specific scenarios (in the X/Y sense I mentioned before) that illustrate that goodness versus the badness of other approaches.
I'm not sure how I can explain it more clearly, but I'll try. Scenario A: one tool to rule them all. Not versioned per virtualenv. If you want multiple versions on a system, you must manage that independently of both virtualenv and the system package manager. You may end up manipulating a virtualenv with version X.Y one day and version X.Z the next day, and then go back to version X.Y. Scenario B: the installation tool is installed into the virtual environment when it is created. No external event will impact the behaviour of installation commands in that virtual environment - you must upgrade the tool within the environment. There are only two packaging systems in use: the system package manager and virtualenv. There is no third mechanism needed to version the package management tool, because that is handled using virtualenv. If one client is using pip 1.3, and the other pip 1.5 (with it's much stricter default security settings), then having separate virtual environments for separate projects will deal with that automatically. I'm not saying what you want to do *can't* be done (it obviously can). I'm saying it adds additional complexity, because you're adding a second isolation mechanism into the mix, with a completelely independent set of ways of interacting with it. For an example of that consider that "pip install" and "python -m pip install" are essentially equivalent commands. The status quo preserves that equivalence even inside a virtual environment, because there is no separate mechanism to be considered. If the system Python site-packages is disabled in an active virtual environment, you don't need to carefully ensure that doesn't affect your install tool, because it's right there in the virtual environment with you. Is it possible that a tool could be written that *did* adequately address all of these cases, with the test suite to prove it? Yes, I'm sure that's possible. What I'm saying is that I *don't really see the point*, since it doesn't achieve anything that couldn't be achieved with a utility script that simply iterates over all defined virtual environments and execute "pip install --upgrade" in each of them (with the wheel cache, every venv after the first should be quick). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm not sure how I can explain it more clearly, but I'll try. [snip]
I understood the scenarios you outlined.
I'm not saying what you want to do *can't* be done (it obviously can). I'm saying it adds additional complexity, because you're adding a second isolation mechanism into the mix, with a completelely independent set of ways of interacting with it.
There is no separate isolation mechanism in my approach - distil runs completely in the context of the virtualenv, so that's all the isolation mechanism there is, in both cases. The difference is that distil doesn't need to be installed in the venv - but otherwise, its sys.path is the same as for any other tool in the venv, as it is running *with the venv's interpreter*, just as the pip in a venv is. Perhaps I haven't made that explicitly clear (I assumed it was implicit in "restart in venv"). For example, the distil run with -e has no access to system site-packages if the venv itself is isolated from system site-packages.
For an example of that consider that "pip install" and "python -m pip install" are essentially equivalent commands. The status quo preserves that equivalence even inside a virtual environment, because there is no separate mechanism to be considered.
Well, distil doesn't have a __main__.py and is not designed to be called with -m, but I don't see an essential difference, because the main vehicle of isolation is *the venv machinery* in *both* pip and distil cases.
If the system Python site-packages is disabled in an active virtual environment, you don't need to carefully ensure that doesn't affect your install tool, because it's right there in the virtual environment with you.
As I said above, distil respects venv isolation: if a venv is isolated from system site-packages, then distil code run with -e can't see system site-packages, automatically, without any "careful ensuring" being necessary. So from what I can see, your argument is based on an imperfect understanding of how distil works. If you'd like to understand it better I'll provide the support, but if you don't want to or can't do that - well, that's a shame, but such is life :-) Regards, Vinay Sajip
On 5 February 2014 01:00, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
As I said above, distil respects venv isolation: if a venv is isolated from system site-packages, then distil code run with -e can't see system site-packages, automatically, without any "careful ensuring" being necessary. So from what I can see, your argument is based on an imperfect understanding of how distil works. If you'd like to understand it better I'll provide the support, but if you don't want to or can't do that - well, that's a shame, but such is life :-)
So now we're back to the point Daniel (I believe) made of "hey, if you restrict yourself to a single file script or a zip archive with a __main__.py, you don't need to install it before running it"? Well yes, that's how Python works. So what you're really arguing for is for pip to become a single file executable like distil, and deal with the consequences of cert file unpacking, etc. Somehow, I don't see that getting anywhere near the top of anyone's priority list any time soon. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 4 February 2014 09:23, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
To me, the correct solution would have been to isolate the bug and remove it, but I completely understand the pragmatic approach you took. But that leaves a "dark corner" where people might be afraid to experiment in this area with pip, because it's somehow "scary"
The big problem with this type of thing is that pip tends to be used on lots of systems with sometimes *extremely* obscure and odd setups - what some of the Linux distros do to a standard Python install amazes me (I'm sure they have good reasons, of course...). So it's not so much a case of "scary dark corners" as lots of unanticipated and extremely difficult to reproduce use cases that we have to support but where the only knowledge of what workarounds are needed is encapsulated in the code. It's a classic maintain vs reimplement issue, unfortunately... Paul
On Tue, 4/2/14, Paul Moore <p.f.moore@gmail.com> wrote:
The big problem with this type of thing is that pip tends to be used on lots of systems with sometimes *extremely* obscure and odd setups - what some of the Linux distros do to a standard Python install amazes me (I'm sure they have good reasons, of course...). So it's not so much a case of "scary dark corners" as lots of unanticipated and extremely difficult to reproduce use cases that we have to support but where the only knowledge of what workarounds are needed is encapsulated in the code.
You've just restated "scary dark corners" in a different way which sounds like there's more to it technically. But there' s still no hard data on the table! Surely, with all the convolutions that these distros go through, can one still assume that pip-the-program-that-they-all-run assumes control via some notional main program, which is passed some arguments? If we can't assume this, please tell me why not. If we can, then it would be like this: def main(): # Do stuff with sys.argv Now, distil has a similar entry point (as innumerable other scripts do) and all I did to restart in venv was to look for a -e argument and process it before the main event: def main(): if '-e' found in sys.argv: validate the passed parameter, prepare a new command line using the interpreter from the venv and the sys.argv passed in, but minus the -e bit, since we've dealt with that here call subprocess with that command line (the restart) return - our work here is done # Do stuff with sys.argv It doesn't get much simpler. Of course there could be bugs in the code called in the if suite, but it's not doing a complicated job so it should be easy to identify and eradicate them. Now, are you saying that the bit of code represented by that if suite, which only has to process sys.argv, look for an identifiable venv in the file system and check that it seems to have a runnable Python interpreter, can be bamboozled by something distros do to their Python installations? I'm having trouble buying that as an argument, unless there is more hard data on the table. What "unanticipated and extremely difficult to reproduce use cases" could interfere with the operations in that if suite? It's hard to argue against voodoo, but anyone on this list could put me right with a particular circumstance that I haven't thought of, but that they have run into, which undermines my argument. It's not uncommon to see the "well ... we're not quite sure what's happening in there ... better play it safe", but that seems out of place in an engineering context. Regards, Vinay Sajip
On 5 February 2014 00:33, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Tue, 4/2/14, Paul Moore <p.f.moore@gmail.com> wrote:
The big problem with this type of thing is that pip tends to be used on lots of systems with sometimes *extremely* obscure and odd setups - what some of the Linux distros do to a standard Python install amazes me (I'm sure they have good reasons, of course...). So it's not so much a case of "scary dark corners" as lots of unanticipated and extremely difficult to reproduce use cases that we have to support but where the only knowledge of what workarounds are needed is encapsulated in the code.
You've just restated "scary dark corners" in a different way which sounds like there's more to it technically. But there' s still no hard data on the table!
http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm... Vinay, please let it drop. Installing pip into every virtualenv works, and you haven't provided a concrete end user *benefit* for removing that layer of isolation - for upgrading pip after an upstream update to be burdensome, a user would need to both have a lot of virtual environments *and* fail to implement a script that automated the task of iterating over them all and upgrading pip. You accuse us of FUD, yet have presented no evidence that your approach works flawlessly across multiple versions of Fedora, Debian, openSUSE, RHEL, CentOS, Debian, Ubuntu, Windows, Mac OS X, etc, despite the fact that many distros are known to ship old versions of pip (and would likely end up shipping old system version of any new tool that performed a similar role, if they shipped it natively at all), and that pip used to have this feature, but it was removed due to the creation of hard to reproduce and debug scenarios. You yourself stated that end users would need to manage parallel installs of the virtualenv independent tool, since it would be designed not to be installed directly inside each virtualenv. And if it *was* intended to be installed inside virtualenv for parallel install support, then users would need to deal with the fact that sometimes it would be in the virtualenv and sometimes not. The status quo is simple, consistent, and comprehensible for developers and end users alike. If the other participants in *distutils-sig* think your suggested approach sounds confusing to use, what chance is there for end users? Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
Vinay, please let it drop.
You accuse us of FUD, yet have presented no evidence that your approach works flawlessly across multiple versions of Fedora, Debian, openSUSE, RHEL, CentOS, Debian, Ubuntu, Windows, Mac OS X, etc,
I merely asked for a hearing, and made no claims I couldn't substantiate, and with actual code in places. But you seem determined not to give me that hearing, so I'll drop it. It's a shame you have to resort to an argument from authority. Regards, Vinay Sajip
On 5 February 2014 01:59, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
Vinay, please let it drop.
You accuse us of FUD, yet have presented no evidence that your approach works flawlessly across multiple versions of Fedora, Debian, openSUSE, RHEL, CentOS, Debian, Ubuntu, Windows, Mac OS X, etc,
I merely asked for a hearing, and made no claims I couldn't substantiate, and with actual code in places. But you seem determined not to give me that hearing, so I'll drop it. It's a shame you have to resort to an argument from authority.
No hearing? You haven't answered *any* of our technical objections, instead dismissing them all as irrelevant. - doesn't work for packages that require installation (which I had forgotten until you reminded me) - doesn't address the distro isolation problem (except by ducking it entirely and saying "don't install it", which isn't a good end user experience, especially on Windows - hence the whole PEP 453 thing so Windows users didn't need to download and run get-pip.py) - doesn't provide a consistent end user experience (what if they *do* install it in some virtual environments and not others?) - doesn't provide any assistance in maintaining a correspondence between pip versions and projects, which at least some freelancers and other contractors see as a valuable capability - should work *in theory* on multiple platforms, but has not been tested broadly on them the way the current approach has Hence, "potentially interesting from a technological perspective, but doesn't solve any of the significant problems currently affecting end users, so not actually all that interesting from a practical perspective" is the natural assessment. It's neat that distil is a standalone script, and hence can be easily run in a venv from outside it, just as venv in Py 3.4 can bootstrap pip without first activating the environment by running "<venv python> -m ensurepip". That's a property of "standalone script" vs "installable package", though, and one that has trade-offs of its own (some of which are noted above) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
No hearing? You haven't answered *any* of our technical objections, instead dismissing them all as irrelevant.
Because you advanced no specific technical objections, instead invoking generalisations. If you prefer the status quo because it's easier to explain (i.e. there's nothing to explain), that's not a technical objection in my book. However you want to look at it, that's fine by me. The example of distil shows that the status quo is not the only way of doing things, and I'll remain interested in any feedback from anyone regarding specific failures of distil in terms of practical workflows that aren't possible. "Doesn't need to be installed in a venv" or "runs as a single file" isn't a failure from my POV, so we'll have to agree to disagree. But, as you say, let's drop it. Jonathan Swift's observation applies. Regards, Vinay Sajip
On 5 February 2014 00:33, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
def main(): if '-e' found in sys.argv: validate the passed parameter, prepare a new command line using the interpreter from the venv and the sys.argv passed in, but minus the -e bit, since we've dealt with that here call subprocess with that command line (the restart) return - our work here is done # Do stuff with sys.argv
That only works if the system site packages is configured to be visible inside the virtual environment - it usually isn't these days. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
That only works if the system site packages is configured to be visible inside the virtual environment - it usually isn't these days.
If that were true, distil wouldn't work in such venvs, but it seems to work fine. Can you show otherwise? Regards, Vinay Sajip
On 5 February 2014 01:48, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Tue, 4/2/14, Nick Coghlan <ncoghlan@gmail.com> wrote:
That only works if the system site packages is configured to be visible inside the virtual environment - it usually isn't these days.
If that were true, distil wouldn't work in such venvs, but it seems to work fine. Can you show otherwise?
I wrote that before you reminded me that distil avoids most of the tricky isolation problems by collapsing everything into a single file to provide natural isolation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Feb 05, 2014, at 01:05 AM, Nick Coghlan wrote:
That only works if the system site packages is configured to be visible inside the virtual environment - it usually isn't these days.
Really? I do this all the time. It prevents downloading gobs of stuff from PyPI that my system already provides in reasonably up-to-date versions. -Barry
On 5 February 2014 02:25, Barry Warsaw <barry@python.org> wrote:
On Feb 05, 2014, at 01:05 AM, Nick Coghlan wrote:
That only works if the system site packages is configured to be visible inside the virtual environment - it usually isn't these days.
Really? I do this all the time. It prevents downloading gobs of stuff from PyPI that my system already provides in reasonably up-to-date versions.
So do I - there's a bunch of RPMs in the system packages that are a pain to install from source (if they're on PyPI at all - assorted Fedora specific utilities aren't). But virtualenv doesn't work that way by default, and the web/PaaS focused devs that write a lot of the upstream "best practices" guidelines aren't typically fans of distro provided packages because they're too old (scientific users often have a similar complaint). Playing peacemaker between the "move fast and break things" and "if it's less than 5 years old it's still just a fad" schools of thought is one of the things that makes packaging so "interesting" ;) Cheers, Nick.
-Barry
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 4 February 2014 16:47, Nick Coghlan <ncoghlan@gmail.com> wrote:
Playing peacemaker between the "move fast and break things" and "if it's less than 5 years old it's still just a fad" schools of thought is one of the things that makes packaging so "interesting" ;)
:-) +1 QOTW Paul
On 4 February 2014 14:33, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
You've just restated "scary dark corners" in a different way which sounds like there's more to it technically. But there' s still no hard data on the table!
But I specifically stated that the hard data has been *lost* and the only remaining evidence is how pip works. I'm not presenting that as a good thing, just as "how life is". There's at least one article somewhere on the internet discussing "why you shouldn't rewrite" in these terms - I think it might have been Joel on Software or maybe Coding Horror - but I can't find it right now.
Surely, with all the convolutions that these distros go through, can one still assume that pip-the-program-that-they-all-run assumes control via some notional main program, which is passed some arguments? If we can't assume this, please tell me why not. If we can, then it would be like this:
On at least one system (OSX?) there's some arrangement where the system Python is not actually the executable, but redirects through a shell script in some manner. I know this has caused bug reports in the past, but I do not have details, nor do I intend to go looking and see if it's applicable here. *If* you were suggesting that pip implement "re-exec in venv" the way you suggest (and I understand that you aren't, even if you are saying it should be simple to do so) then the onus is on you to confirm that doing so will not cause regressions or bugs in this and similar known risk areas. Otherwise, the assessment of the suggestion will remain "interesting, but insufficient benefit to justify the risk". Anyway, never mind. I agree the technique is interesting, and we have no demonstrated bugs in it. But I don't imagine anyone will take the risk on a widely-used project like pip any time in the near future. Paul
Carl Meyer <carl <at> oddbird.net> writes:
Sadly I can't offer that specific problem, because AFAIK we never tracked it down. It was a recurring problem in IRC where people would report using -E and pip would fail to take its action within the
After posting my first response to your post, I remembered that Donald had provided a link to the commit that removed the -E. From a quick look at that commit, ISTM that distil's approach to restarting is somewhat easier to reason about and debug than the approach taken by pip to implement -E: it takes place before any of the rest of the logic in the restarted program is run, so a question like "why didn't restart_in_venv get called, when it should have been?" doesn't arise. There are no rabbit-holes to go down. The approach in that snippet could apply to any script - it's not especially distil-specific. Regards, Vinay Sajip
On 1 February 2014 09:29, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
However, it's substantially *harder* to explain to people how to use it correctly that way.
But distil installs into venvs (both PEP 405 and virtualenv) without needing to be in there. It's as simple as
distil -e path/to/venv install XYZ
I would like it if pip worked the same way (even though the tide of installing pip into each virtualenv may be unstoppable by now). Would you be interested in writing a patch to do this? I don't have the time to commit to it and nobody else seems particularly interested in doing it, which is one reason it's not happened, I suspect. It's certainly not a fundamental limitation of pip. If you'd rather have this as a selling point for distil, then that's fair. But the vast weight of existing pip users and documentation, plus the core support for pip as the preferred default installer encapsulated in PEP 453, is going to make this much less useful to people in practice. Paul.
On Sat, 1/2/14, Paul Moore <p.f.moore@gmail.com> wrote:
I would like it if pip worked the same way (even though the tide of installing pip into each virtualenv may be unstoppable by now). Would you be interested in writing a patch to do this? I don't have the time to commit to it and nobody else seems particularly interested in doing it, which is one reason it's not happened, I suspect. It's certainly not a fundamental limitation of pip.
It's not, and virtualenv does it with -p, right? Here's the distil code that does it: ARG_RE = re.compile(r'-[ep](=?.*)?') # Use encode rather than br'' to avoid syntax error in 2.5, so we can # print a more helpful message. PYTHON_RE = re.compile(r'Python \d(\.\d+)+'.encode('utf-8')) def find_python(env_or_python, is_env): result = None if not is_env: result = env_or_python else: env = os.path.abspath(env_or_python) if os.path.isdir(env): from distutils import sysconfig exe = 'python%s' % sysconfig.get_config_var('EXE') for subdir in ('bin', 'Scripts'): p = os.path.join(env, subdir, exe) if os.path.isfile(p) and os.access(p, os.X_OK): result = p break if result: import subprocess try: cmd = [result, '--version'] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) out, _ = p.communicate() if p.returncode: logger.error('Command %r returned non-zero status: %d', ' '.join(cmd), p.returncode) result = None elif not PYTHON_RE.match(out): logger.error('Command %r - unexpected output: %r', ' '.join(cmd), out) result = None except Exception as e: logger.exception('Command %r - unexpected exception: %r', ' '.join(cmd), e) result = None return result def find_env_or_python(args=None): result = None args = (args or sys.argv)[1:] nargs = len(args) for i, arg in enumerate(args): m = ARG_RE.match(arg) if m: value = m.groups()[0] start = i + 1 if value: if value[0] in '=:': argval = value[1:] else: argval = value end = i + 2 elif i < (nargs - 1): argval = args[i + 1] end = i + 3 # if result is set, then start and end indicate the slice to lose # from sys.argv. First, validate the env. result = find_python(argval, arg[1] =='e') if result: del sys.argv[start:end] else: if arg[1] == 'e': msg = 'No such environment: %r' % argval else: msg = 'No such Python interpreter: %r' % argval raise ValueError(msg) return result def main(args=None): python = find_env_or_python() if python: cmd = [python] + sys.argv import subprocess return subprocess.call(cmd) # the rest of the main() logic goes here as normal
If you'd rather have this as a selling point for distil, then that's fair. But the vast weight of existing pip users and documentation, plus the core support for pip as the preferred default installer encapsulated in PEP 453, is going to make this much less useful to people in practice.
I'm not trying to "sell" distil versus pip, as I've said before, though I do want it to be better than pip, by trying out different ideas on how to do things. In my admittedly biased opinion it already is better in numerous respects, but if people won't see for lack of looking, that's not my loss :-) The reasons I wouldn't volunteer to add this to pip are: 1. I don't believe I really need to, since the code above is pretty self- contained and shouldn't take too long to adapt. So, any obstacles to adding it would surely not be technical. Anyone can feel free to add the above to pip. 2. I find working on pip painful because of how it works. I have no animus against pip; I have overcome this pain in the past and did a fair chunk of the work in getting it to work on 2.x and 3.x in a single code-base. But that was when I thought it was important for 3.x adoption to get pip and virtualenv working under 3.x in a low-maintenance fashion (i.e. no 2to3). Of course, I now have the use of distil, so I seldom need to use pip :-) 3. I try to avoid "politics" and "religion" in software development when I can, and I feel their presence more in the pip project than I'm comfortable with. I don't really want to fight non-technical battles. Re. PEP 453, I voiced my reservations at the time, but if "that ship has sailed", then so be it - "nobody ever got fired for recommending pip". The Python community will get the packaging infrastructure that it deserves, just like a populace does with elections :-) Regards, Vinay Sajip
On Sat, Feb 1, 2014 at 3:23 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Fri, 31/1/14, Brian Wickman <wickman@gmail.com> wrote:
There are myriad other practical reasons. Here are some:
Thanks for taking the time to respond with the details - they are good data points to think about!
Lastly, there are social reasons. It's just hard to convince most engineers to use things like pkg_resources or pkgutil to manipulate resources when for them the status quo is just using __file__. Bizarrely the social challenges are just as hard as the abovementioned technical challenges.
I agree it's bizarre, but sadly it's not surprising. People get used to certain ways of doing things, and a certain kind of collective myopia develops when it comes to looking at different ways of doing things. Having worked with fairly diverse systems in my time, ISTM that sections of the Python community have this myopia too. For example, the Java hatred and PEP 8 zealotry that you see here and there.
PEP 302 tried to unify this with get_data() and set_data() on loaders, but prior to Python 3.3 you just didn't have any guarantee that __loader__ would even be set, let alone have a loader with those methods. Paul can tell me if my hunch is off, but I assume the dream was that people would do `__loader__.get_data('asset.txt')` instead of `os.path.join(os.path.dirname(__file__), 'asset.txt')` to read something bundled with your package. But as Brian has pointed out, people just have habits at this point of assuming unpacked files and working off of __file__. I know Daniel has said he wanted some concept of a listdir() on loaders so that they could basically act as a virtual file system for packages, but it would also require locking down what relative vs. absolute paths meant in get_data/set_data (i.e. is it relative to the loader in the package or the package itself? My pref is the latter for the case of reused loaders) and really pushing people to use the loader APIs for reading intra-package "files".
One of the things that's puzzled me, for example, is why people think it's reasonable or even necessary to have copies of pip and setuptools in every virtual environment - often the same people who will tell you that your code isn't DRY enough! It's certainly not a technical requirement, yet one of the reasons why PEP 405 venvs aren't that popular is that pip and setuptools aren't automatically put in there. It's a social issue - it's been decided that rather than exploring a technical approach to addressing any issue with installing into venvs, it's better to bundle pip and setuptools with Python 3.4, since that will seemingly be easier for people to swallow :-)
I suspect it's also ignorance and differences in deployment strategies. Some people have such small deployments that hitting PyPI every time works, for others like Brian it can't be more than a cp w/ an unzip. Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section or something. I doubt there are that many common needs beyond "pull from PyPI every time", "pull from your own wheel repo", "copy everything over in a single wheel/zip you unpack" so that people at least have a starting point to work from (especially if the instructions work on all platforms, i.e. no symlink discussions).
On Sat, Feb 1, 2014, at 12:26 PM, Brett Cannon wrote: On Sat, Feb 1, 2014 at 3:23 AM, Vinay Sajip <[1]vinay_sajip@yahoo.co.uk> wrote: On Fri, 31/1/14, Brian Wickman <[2]wickman@gmail.com> wrote:
There are myriad other practical reasons. Here are some:
Thanks for taking the time to respond with the details - they are good data points to think about!
Lastly, there are social reasons. It's just hard to convince most engineers
to use things like pkg_resources or pkgutil to manipulate resources
when for them the status quo is just using __file__. Bizarrely the social
challenges are just as hard as the abovementioned technical challenges.
I agree it's bizarre, but sadly it's not surprising. People get used to certain ways of doing things, and a certain kind of collective myopia develops when it comes to looking at different ways of doing things. Having worked with fairly diverse systems in my time, ISTM that sections of the Python community have this myopia too. For example, the Java hatred and PEP 8 zealotry that you see here and there. PEP 302 tried to unify this with get_data() and set_data() on loaders, but prior to Python 3.3 you just didn't have any guarantee that __loader__ would even be set, let alone have a loader with those methods. Paul can tell me if my hunch is off, but I assume the dream was that people would do `__loader__.get_data('asset.txt')` instead of `os.path.join(os.path.dirname(__file__), 'asset.txt')` to read something bundled with your package. But as Brian has pointed out, people just have habits at this point of assuming unpacked files and working off of __file__. I know Daniel has said he wanted some concept of a listdir() on loaders so that they could basically act as a virtual file system for packages, but it would also require locking down what relative vs. absolute paths meant in get_data/set_data (i.e. is it relative to the loader in the package or the package itself? My pref is the latter for the case of reused loaders) and really pushing people to use the loader APIs for reading intra-package "files". There's also the problem when you need to give a file that you have packaged as part of your thing to an API that only accepts filenames. The standard ssl module is one of these that i've run into recently. One of the things that's puzzled me, for example, is why people think it's reasonable or even necessary to have copies of pip and setuptools in every virtual environment - often the same people who will tell you that your code isn't DRY enough! It's certainly not a technical requirement, yet one of the reasons why PEP 405 venvs aren't that popular is that pip and setuptools aren't automatically put in there. It's a social issue - it's been decided that rather than exploring a technical approach to addressing any issue with installing into venvs, it's better to bundle pip and setuptools with Python 3.4, since that will seemingly be easier for people to swallow :-) I suspect it's also ignorance and differences in deployment strategies. Some people have such small deployments that hitting PyPI every time works, for others like Brian it can't be more than a cp w/ an unzip. Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section or something. I doubt there are that many common needs beyond "pull from PyPI every time", "pull from your own wheel repo", "copy everything over in a single wheel/zip you unpack" so that people at least have a starting point to work from (especially if the instructions work on all platforms, i.e. no symlink discussions). _______________________________________________ Distutils-SIG maillist - [3]Distutils-SIG@python.org [4]https://mail.python.org/mailman/listinfo/distutils-sig References 1. mailto:vinay_sajip@yahoo.co.uk 2. mailto:wickman@gmail.com 3. mailto:Distutils-SIG@python.org 4. https://mail.python.org/mailman/listinfo/distutils-sig
On Sat, 1/2/14, Donald Stufft <donald@stufft.io> wrote:
There's also the problem when you need to give a file that you have packaged as part of your thing to an API that only accepts filenames. The standard ssl module is one of these that i've run into recently.
Yes, this is one of the pkgutil gaps I referred to in my reply to Brett, which pkg_resources and distlib.resources both address. Regards, Vinay Sajip
On Sat, Feb 1, 2014 at 12:34 PM, Donald Stufft <donald@stufft.io> wrote:
On Sat, Feb 1, 2014, at 12:26 PM, Brett Cannon wrote:
On Sat, Feb 1, 2014 at 3:23 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
On Fri, 31/1/14, Brian Wickman <wickman@gmail.com> wrote:
There are myriad other practical reasons. Here are some:
Thanks for taking the time to respond with the details - they are good data points to think about!
Lastly, there are social reasons. It's just hard to convince most engineers to use things like pkg_resources or pkgutil to manipulate resources when for them the status quo is just using __file__. Bizarrely the social challenges are just as hard as the abovementioned technical challenges.
I agree it's bizarre, but sadly it's not surprising. People get used to certain ways of doing things, and a certain kind of collective myopia develops when it comes to looking at different ways of doing things. Having worked with fairly diverse systems in my time, ISTM that sections of the Python community have this myopia too. For example, the Java hatred and PEP 8 zealotry that you see here and there.
PEP 302 tried to unify this with get_data() and set_data() on loaders, but prior to Python 3.3 you just didn't have any guarantee that __loader__ would even be set, let alone have a loader with those methods. Paul can tell me if my hunch is off, but I assume the dream was that people would do `__loader__.get_data('asset.txt')` instead of `os.path.join(os.path.dirname(__file__), 'asset.txt')` to read something bundled with your package. But as Brian has pointed out, people just have habits at this point of assuming unpacked files and working off of __file__.
I know Daniel has said he wanted some concept of a listdir() on loaders so that they could basically act as a virtual file system for packages, but it would also require locking down what relative vs. absolute paths meant in get_data/set_data (i.e. is it relative to the loader in the package or the package itself? My pref is the latter for the case of reused loaders) and really pushing people to use the loader APIs for reading intra-package "files".
There's also the problem when you need to give a file that you have packaged as part of your thing to an API that only accepts filenames. The standard ssl module is one of these that i've run into recently.
Yes, that is definitely a design flaw in the ssl module that should get remedied. Did you file a bug to add a new API (whether new function or new parameters) to accept a file-like object or string instead? -Brett
One of the things that's puzzled me, for example, is why people think it's reasonable or even necessary to have copies of pip and setuptools in every virtual environment - often the same people who will tell you that your code isn't DRY enough! It's certainly not a technical requirement, yet one of the reasons why PEP 405 venvs aren't that popular is that pip and setuptools aren't automatically put in there. It's a social issue - it's been decided that rather than exploring a technical approach to addressing any issue with installing into venvs, it's better to bundle pip and setuptools with Python 3.4, since that will seemingly be easier for people to swallow :-)
I suspect it's also ignorance and differences in deployment strategies. Some people have such small deployments that hitting PyPI every time works, for others like Brian it can't be more than a cp w/ an unzip. Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section or something. I doubt there are that many common needs beyond "pull from PyPI every time", "pull from your own wheel repo", "copy everything over in a single wheel/zip you unpack" so that people at least have a starting point to work from (especially if the instructions work on all platforms, i.e. no symlink discussions). *_______________________________________________* Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
-- Donald Stufft donald@stufft.io On Sat, Feb 1, 2014, at 01:06 PM, Brett Cannon wrote: On Sat, Feb 1, 2014 at 12:34 PM, Donald Stufft <[1]donald@stufft.io> wrote: On Sat, Feb 1, 2014, at 12:26 PM, Brett Cannon wrote: On Sat, Feb 1, 2014 at 3:23 AM, Vinay Sajip <[2]vinay_sajip@yahoo.co.uk> wrote: On Fri, 31/1/14, Brian Wickman <[3]wickman@gmail.com> wrote:
There are myriad other practical reasons. Here are some:
Thanks for taking the time to respond with the details - they are good data points to think about!
Lastly, there are social reasons. It's just hard to convince most engineers
to use things like pkg_resources or pkgutil to manipulate resources
when for them the status quo is just using __file__. Bizarrely the social
challenges are just as hard as the abovementioned technical challenges.
I agree it's bizarre, but sadly it's not surprising. People get used to certain ways of doing things, and a certain kind of collective myopia develops when it comes to looking at different ways of doing things. Having worked with fairly diverse systems in my time, ISTM that sections of the Python community have this myopia too. For example, the Java hatred and PEP 8 zealotry that you see here and there. PEP 302 tried to unify this with get_data() and set_data() on loaders, but prior to Python 3.3 you just didn't have any guarantee that __loader__ would even be set, let alone have a loader with those methods. Paul can tell me if my hunch is off, but I assume the dream was that people would do `__loader__.get_data('asset.txt')` instead of `os.path.join(os.path.dirname(__file__), 'asset.txt')` to read something bundled with your package. But as Brian has pointed out, people just have habits at this point of assuming unpacked files and working off of __file__. I know Daniel has said he wanted some concept of a listdir() on loaders so that they could basically act as a virtual file system for packages, but it would also require locking down what relative vs. absolute paths meant in get_data/set_data (i.e. is it relative to the loader in the package or the package itself? My pref is the latter for the case of reused loaders) and really pushing people to use the loader APIs for reading intra-package "files". There's also the problem when you need to give a file that you have packaged as part of your thing to an API that only accepts filenames. The standard ssl module is one of these that i've run into recently. Yes, that is definitely a design flaw in the ssl module that should get remedied. Did you file a bug to add a new API (whether new function or new parameters) to accept a file-like object or string instead? -Brett I don't recall if I did or not, I know there's one open for allowing in memory client side certs, no idea for a ca bundle. It's easy enough to work around of course, but it requires the author of a tool that uses that to expect to need to work around it, and many authors simply don't care to support zipped usage. One of the things that's puzzled me, for example, is why people think it's reasonable or even necessary to have copies of pip and setuptools in every virtual environment - often the same people who will tell you that your code isn't DRY enough! It's certainly not a technical requirement, yet one of the reasons why PEP 405 venvs aren't that popular is that pip and setuptools aren't automatically put in there. It's a social issue - it's been decided that rather than exploring a technical approach to addressing any issue with installing into venvs, it's better to bundle pip and setuptools with Python 3.4, since that will seemingly be easier for people to swallow :-) I suspect it's also ignorance and differences in deployment strategies. Some people have such small deployments that hitting PyPI every time works, for others like Brian it can't be more than a cp w/ an unzip. Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section or something. I doubt there are that many common needs beyond "pull from PyPI every time", "pull from your own wheel repo", "copy everything over in a single wheel/zip you unpack" so that people at least have a starting point to work from (especially if the instructions work on all platforms, i.e. no symlink discussions). _______________________________________________ Distutils-SIG maillist - [4]Distutils-SIG@python.org [5]https://mail.python.org/mailman/listinfo/distutils-sig _______________________________________________ Distutils-SIG maillist - [6]Distutils-SIG@python.org [7]https://mail.python.org/mailman/listinfo/distutils-sig References 1. mailto:donald@stufft.io 2. mailto:vinay_sajip@yahoo.co.uk 3. mailto:wickman@gmail.com 4. mailto:Distutils-SIG@python.org 5. https://mail.python.org/mailman/listinfo/distutils-sig 6. mailto:Distutils-SIG@python.org 7. https://mail.python.org/mailman/listinfo/distutils-sig
On Sat, 1/2/14, Brett Cannon <brett@python.org> wrote:
Yes, that is definitely a design flaw in the ssl module that should get remedied. Did you file a bug to add a new API (whether new function or new parameters) to accept a file-like object or string instead?
While that might improve flexibility in how you use the ssl module, it sadly won't solve the problem in general. Some third party APIs will expect file paths, whereas others will require passing OS-level handles, rather than file-like objects - thus effectively requiring a file which you can open with os.open(). There are other Python APIs which need a pathname - I know imp.load_dynamic() will be familiar to you :-) Regards, Vinay Sajip
On Sat, Feb 1, 2014 at 2:00 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Brett Cannon <brett@python.org> wrote:
Yes, that is definitely a design flaw in the ssl module that should get remedied. Did you file a bug to add a new API (whether new function or new parameters) to accept a file-like object or string instead?
While that might improve flexibility in how you use the ssl module, it sadly won't solve the problem in general. Some third party APIs will expect file paths, whereas others will require passing OS-level handles, rather than file-like objects - thus effectively requiring a file which you can open with os.open().
No one is talking about solving this problem overnight. But if we don't want to bother putting the effort in now then the zipimport module might as well get tossed as this is a constant issue that people are not designing APIs to support. So if people are going to implicitly not support it because no one is asking for them to care then we shouldn't try to support it ourselves and make our lives easier (gods know the zipimport module is a pain to keep working).
There are other Python APIs which need a pathname - I know imp.load_dynamic() will be familiar to you :-)
Sure, but that's for technical reasons so it gets a pass (plus I didn't design that API =).
On Sat, 1/2/14, Brett Cannon <brett@python.org> wrote:
No one is talking about solving this problem overnight. But if we don't want to bother putting the effort in now then the zipimport module might as well get tossed as this is a constant issue that people are not designing APIs to support.
Whoa - nobody said that in-Python APIs of this nature shouldn't be improved (as Christian says he's already done with this one). I was pointing out that there are also third-party APIs that we don't control which people will want to use, meaning that the need [for an API] to copy in-zip resources to the file system will be around for quite some time. Regards, Vinay Sajip
On Sat, Feb 1, 2014 at 3:00 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
On Sat, 1/2/14, Brett Cannon <brett@python.org> wrote:
No one is talking about solving this problem overnight. But if we don't want to bother putting the effort in now then the zipimport module might as well get tossed as this is a constant issue that people are not designing APIs to support.
Whoa - nobody said that in-Python APIs of this nature shouldn't be improved (as Christian says he's already done with this one). I was pointing out that there are also third-party APIs that we don't control which people will want to use, meaning that the need [for an API] to copy in-zip resources to the file system will be around for quite some time.
Sorry for the misunderstanding. Obviously read your text in a way you didn't mean for it to be read. =)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 01.02.2014 19:06, Brett Cannon wrote:
Yes, that is definitely a design flaw in the ssl module that should get remedied. Did you file a bug to add a new API (whether new function or new parameters) to accept a file-like object or string instead?
Python 3.4 will allow to load CA certs from a string (PEM) or binary buffer (DER). I haven't fixed load_cert_chain() yet. http://docs.python.org/3.4/library/ssl.html#ssl.SSLContext.load_verify_locat... -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCgAGBQJS7U3wAAoJEMeIxMHUVQ1FETAQAJd6STF1KOG+YMLbuWFnMOen d+wm8H/abbyVdwg/OL3KGvRJa+qR/FoW3srMJhaVgD7pkiW+2FxRy3N4FWIJbsfM KxyNKkvnYIIdnZdsv9oz7lVHhdibrUJKvIyRMfaN/wPFcYlB6Y57wh5Z+rkZCRzf gkw7kSuinceNXtGTsBFYzMKkN/Je6RmNtP96RdtG5zzxuS8V07LjWpcw7VlYoMml kq2lxcH1B9yR7lDdnrkm5fdUMUzWzRiAi7ZlBYxdIR4PP+UcVaP6AESURy/GF3mn gbQc0UGz3pI+afS5V/Ajqd/K4w5vqC1rQ48M935XVWvduyT1+Dg5a38lHEThcHOp gWdY1YOqXwMsxO34/juABjeJkaTJk1bd76CZFJHl2lE5aHJfk49iTIR6Gcq8fOUe Cb6t0HykYhAF/hBr1ILrN1wkDMq+6Lq2QunKLhn4N4IYP4+WTTxduYuVn9CNJHW3 NCY2RYp0yHvCKw6D86bNr/5ssDIvqfBr4u58Rs93QmkIuYO/2c113Ty2ET3f0+u8 PnfjIGSqDFjctNETPpCygjxMh0IQ+VlKiCV8XlHGA9cWNho+jMLzZOKlKo/notxY oW7gm2XTxs9O42fO8P9cl1+1I85o3VdDaN5ey16qkptkCSvlSsh1JsBcL+TtP/We f0Wg5gtO5y+q/3c6JcFx =FGVA -----END PGP SIGNATURE-----
On Sat, 1/2/14, Brett Cannon <brett@python.org> wrote:
PEP 302 tried to unify this with get_data() and set_data() on loaders, but prior to Python 3.3 you just didn't have any guarantee that __loader__ would even be set, let alone have a loader with those methods. Paul can tell me if my hunch is off, but I assume the dream was that people would do `__loader__.get_data('asset.txt')` instead of `os.path.join(os.path.dirname(__file__), 'asset.txt')` to read something bundled with your package. But as Brian has pointed out, people just have habits at this point of assuming unpacked files and working off of __file__.
Right, and one question is whether we need to educate people away from this way of doing things. In the distlib.resources module I've tried to work around things like the absence of __loader__ that you mention, and to provide a uniform API for accessing package resources whether in file system or zip, using the PEP 302 machinery whenever possible, and filling in some of the gaps that pkgutil has and which pkg_resources attempts to fill.
My pref is [...] and really pushing people to use the loader APIs for reading intra-package "files".
+1
Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section [...] (especially if the instructions work on all platforms, i.e. no symlink discussions)
+1 again. Regards, Vinay Sajip
On 1 February 2014 17:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
My pref is [...] and really pushing people to use the loader APIs for reading intra-package "files".
+1
Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section [...] (especially if the instructions work on all platforms, i.e. no symlink discussions)
+1 again.
+1 to both of these. Paul
On Sat, Feb 1, 2014 at 1:00 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 1 February 2014 17:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
My pref is [...] and really pushing people to use the loader APIs for reading intra-package "files".
+1
Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section [...] (especially if the instructions work on all platforms, i.e. no symlink discussions)
+1 again.
+1 to both of these.
While I'm not in a good position to launch the latter suggestion, the former one I can help with. =) I'll start a thread over on import-sig about how to handle get_data/set_data and anything missing so that intra-package data reading can be covered by loaders.
On Sat, Feb 1, 2014 at 1:16 PM, Brett Cannon <brett@python.org> wrote:
On Sat, Feb 1, 2014 at 1:00 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 1 February 2014 17:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
My pref is [...] and really pushing people to use the loader APIs for reading intra-package "files".
+1
Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section [...] (especially if the instructions work on all platforms, i.e. no symlink discussions)
+1 again.
+1 to both of these.
While I'm not in a good position to launch the latter suggestion, the former one I can help with. =) I'll start a thread over on import-sig about how to handle get_data/set_data and anything missing so that intra-package data reading can be covered by loaders.
But I can file bugs =) https://github.com/pypa/python-packaging-user-guide/issues/24
On Sat, Feb 1, 2014 at 1:16 PM, Brett Cannon <brett@python.org> wrote:
On Sat, Feb 1, 2014 at 1:00 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 1 February 2014 17:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
My pref is [...] and really pushing people to use the loader APIs for reading intra-package "files".
+1
Maybe the Packaging Users Guide could have a Recommended Deployment Strategies section [...] (especially if the instructions work on all platforms, i.e. no symlink discussions)
+1 again.
+1 to both of these.
While I'm not in a good position to launch the latter suggestion, the former one I can help with. =) I'll start a thread over on import-sig about how to handle get_data/set_data and anything missing so that intra-package data reading can be covered by loaders.
https://mail.python.org/pipermail/import-sig/2014-February/000770.html
On 1 February 2014 17:26, Brett Cannon <brett@python.org> wrote:
PEP 302 tried to unify this with get_data() and set_data() on loaders, but prior to Python 3.3 you just didn't have any guarantee that __loader__ would even be set, let alone have a loader with those methods. Paul can tell me if my hunch is off, but I assume the dream was that people would do `__loader__.get_data('asset.txt')` instead of `os.path.join(os.path.dirname(__file__), 'asset.txt')` to read something bundled with your package. But as Brian has pointed out, people just have habits at this point of assuming unpacked files and working off of __file__.
That's right. Unfortunately, (a) people have habits, as you say, (b) the loader API was too limited - for good reason, we couldn't add any more without breaking more than we gained at the time, and (c) things like ssl that need files as Donald points out are always a problem. Setuptools (and distlib more recently) tried to address this with unpack-on-demand tricks, but honestly non-filesystem loaders are still such an uncommon use case that most people don't bother. It's just a very, very slow process of change.
I know Daniel has said he wanted some concept of a listdir() on loaders so that they could basically act as a virtual file system for packages, but it would also require locking down what relative vs. absolute paths meant in get_data/set_data (i.e. is it relative to the loader in the package or the package itself? My pref is the latter for the case of reused loaders) and really pushing people to use the loader APIs for reading intra-package "files".
That is probably the most annoying omission from PEP 302. IIRC, we did want to add it, but couldn't. But I can't recall the reason now. Maybe it could be remedied - it really is a pain. Paul
On 1 February 2014 05:31, Brian Wickman <wickman@gmail.com> wrote:
This is in response to Vinay's thread but since I wasn't subscribed to distutils-sig, I couldn't easily respond directly to it.
Vinay's right, the technology here isn't revolutionary but what's notable is that we've been using it in production for almost 3 years at Twitter. It's also been open-sourced for a couple years at https://github.com/twitter/commons/tree/master/src/python/twitter/common/pyt... but not widely announced (it is, after all, just a small subdirectory in a fairly large mono-repo, and was only recently published independently to PyPI as twitter.common.python.)
PEX files are just executable zip files with hashbangs containing a carefully constructed __main__.py and a PEX-INFO, which is json-encoded dictionary describing how to scrub and bootstrap sys.path and the like. They work equally well unpacked into a standalone directory.
In practice PEX files are simultaneously our replacement for virtualenv and also our way of distributing Python applications to production. Now we could use virtualenv to do this but it's hard to argue with a deployment process that is literally "cp". Furthermore, most of our machines don't have compiler toolchains or external network access, so hermetically sealing all dependencies once at build time (possibly for multiple platforms since all developers use Macs) has huge appeal. This is even more important at Twitter where it's common to run a dozen different Python applications on the same box at the same time, some using 2.6, some 2.7, some PyPy, but all with varying versions of underlying dependencies.
Ah, very interesting - this is exactly the kind of thing we were trying to enable with the executable directory/zip file support in Python 2.6, and then we went and forgot to cover it in the original "What's New in Python 2.6?" doc, so an awful lot of people never learned about the existence of the feature :P As Daniel noted, PEP 441 is at least as much about letting people know "Hey, Python has supported direct execution of directories and zip archives since Python 2.6!" as it is about actually providing some tools that better support doing things that way :)
Speaking to recent distutils-sig threads, we used to go way out of our way to never hit disk (going so far as building our own .egg packager and pure python recursive zipimport implementation so that we could import from eggs within zips, write ephemeral .so's to dlopen and unlink) but we've since moved away from that position for simplicity's sake. For practical reasons we've always needed "not zip-safe" PEX files where all code is written to disk prior to execution (ex: legacy Django applications that have __file__-relative business logic) so we decided to just throw away the magical zipimport stuff and embrace using disk as a pure cache. This seems more compatible philosophically with the direction wheels are going for example.
Since there's been more movement in the PEP space recently, we've been evolving PEX in order to be as standards-compliant as possible, which is why I've been more visible recently re: talks, .whl support and the like. I'd also love to chat about more about PEX and how it relates to things like PEP 441 and/or other attempts like pyzzer.
I think it's very interesting and relevant indeed :) The design in PEP 441 just involves providing some very basic infrastructure around making use of the direct execution support, mostly in order to make the feature more discoverable in the first place, but it sounds like Twitter have a much richer and battle-hardened approach in PEX. It may still prove to be more appropriate to keep the stdlib infrastructure very basic (i.e. more at the PEP 441) level, leaving something like PEX free to provide cross-version consistency. I'll wait and see how the public documentation for PEX evolves before forming a firm opinion one way or the other, but one possible outcome would be a situation akin to the pyvenv vs virtualenv distinction, where pyvenv has the benefit of always being available even in environments where it is difficult to get additional third party tools approved, but also has the inherent downside of Python version dependent differences in available features and behaviour. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (11)
-
Barry Warsaw
-
Brett Cannon
-
Brian Wickman
-
Carl Meyer
-
Christian Heimes
-
Daniel Holth
-
Donald Stufft
-
Nick Coghlan
-
Noah Kantrowitz
-
Paul Moore
-
Vinay Sajip