PEP 376, the INSTALLER file, and system packages
PEP 376 added a file to the .dist-info directory called "INSTALLER" which was supposed to be: This option is the name of the tool used to invoke the installation. However, nothing has really ever implemented it and it's gone largely ignored until just recently pip 8.0 started writing the INSTALLER file into the metadata directories with a value of "pip". I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I think since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value. The benefit of doing this, is that with a special value in that file that says "this file belongs to the OS", then pip could start looking for that file and require a --force flag before it modifies any files belonging to that project. Then distributors like Debian, Fedora, etc could simply write out the INSTALLER file with the correct value, and pip would start to respect their files by default. Thoughts? Does this sound reasonable to everyone? Do we think it needs a new PEP or can we just amend PEP 376 to add this extra bit of information? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 22 January 2016 at 18:10, Donald Stufft
The benefit of doing this, is that with a special value in that file that says "this file belongs to the OS", then pip could start looking for that file and require a --force flag before it modifies any files belonging to that project. Then distributors like Debian, Fedora, etc could simply write out the INSTALLER file with the correct value, and pip would start to respect their files by default.
Thoughts? Does this sound reasonable to everyone? Do we think it needs a new PEP or can we just amend PEP 376 to add this extra bit of information?
Using "system" as the name of the "installer" sounds like it conforms pretty well to PEP 376 as it stands. Paul
On Fri, Jan 22, 2016 at 1:10 PM, Donald Stufft
Does this sound reasonable to everyone? Do we think it needs a new PEP or can we just amend PEP 376 to add this extra bit of information?
Identifying something special like "system" doesn't seem bad, and conforms (assuming PEP 376 really meant to include "+" at the end of the regular expression). If tools find the INSTALLER file and it identifies some other tool, though, it shouldn't matter if there's a special "system" value. Debian systems could use "dpkg" or "apt", RPM-based systems could use "rpm" or "yum", and it sounds like the new pip would be just as happy to do the right thing. I don't think a new or updated PEP is needed. I'm glad to see this being addressed. Thanks for all your hard work, Donald! -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein
On Fri, Jan 22, 2016 at 12:10 PM, Donald Stufft
PEP 376 added a file to the .dist-info directory called "INSTALLER" which was supposed to be:
This option is the name of the tool used to invoke the installation.
However, nothing has really ever implemented it and it's gone largely ignored until just recently pip 8.0 started writing the INSTALLER file into the metadata directories with a value of "pip".
I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I think since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value.
Why not use something like JSON to allow us to have a little more information about the installer, e.g., {"name": "pip", "system": true, ...}
The benefit of doing this, is that with a special value in that file that says "this file belongs to the OS", then pip could start looking for that file and require a --force flag before it modifies any files belonging to that project. Then distributors like Debian, Fedora, etc could simply write out the INSTALLER file with the correct value, and pip would start to respect their files by default.
Thoughts? Does this sound reasonable to everyone? Do we think it needs a new PEP or can we just amend PEP 376 to add this extra bit of information?
I think amending the PEP makes sense.
On Jan 22, 2016 10:11 AM, "Donald Stufft"
PEP 376 added a file to the .dist-info directory called "INSTALLER" which
supposed to be:
This option is the name of the tool used to invoke the installation.
However, nothing has really ever implemented it and it's gone largely ignored until just recently pip 8.0 started writing the INSTALLER file into the metadata directories with a value of "pip".
I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I
was think
since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value.
The benefit of doing this, is that with a special value in that file that says "this file belongs to the OS", then pip could start looking for that file and require a --force flag before it modifies any files belonging to that
I think we want more than just "system", because the same user could have some packages managed by dpkg and some by conda, both of which have their own dependency resolution mechanisms that are outside pip's and could get broken if pip removes stuff willy-nilly. And when pip errors out, you want to be able to explain to the user "this package is managed by conda, and using pip on it may break your conda setup..." versus "this package is managed by Debian, and using pip on it may break your Debian setup...". (Actually I'm not sure what the status these days is of mixing pip and conda -- they've gotten somewhat better at handling it. Is the proposed behavior in pip when it sees this flag something that distribution maintainers have asked for? Are they present in this thread?) project.
Then distributors like Debian, Fedora, etc could simply write out the INSTALLER file with the correct value, and pip would start to respect their files by default.
I'd like a little more clarity on exactly what circumstances justify setting this flag. If I write a new python package manager, then should I set this flag on all my packages because I don't trust anyone else to get things right? :-) Maybe the relevant thing is what I said above, that there is some system tracking these files that is not using the dist-info directory as its source-of-truth about what's installed, dependencies, etc. -n
On Jan 22, 2016, at 1:46 PM, Nathaniel Smith
wrote: On Jan 22, 2016 10:11 AM, "Donald Stufft"
mailto:donald@stufft.io> wrote: PEP 376 added a file to the .dist-info directory called "INSTALLER" which was supposed to be:
This option is the name of the tool used to invoke the installation.
However, nothing has really ever implemented it and it's gone largely ignored until just recently pip 8.0 started writing the INSTALLER file into the metadata directories with a value of "pip".
I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I think since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value.
I think we want more than just "system", because the same user could have some packages managed by dpkg and some by conda, both of which have their own dependency resolution mechanisms that are outside pip's and could get broken if pip removes stuff willy-nilly. And when pip errors out, you want to be able to explain to the user "this package is managed by conda, and using pip on it may break your conda setup..." versus "this package is managed by Debian, and using pip on it may break your Debian setup...".
(Actually I'm not sure what the status these days is of mixing pip and conda -- they've gotten somewhat better at handling it. Is the proposed behavior in pip when it sees this flag something that distribution maintainers have asked for? Are they present in this thread?)
Yea, that’s why I thought about dpkg (system) or system(Debian) or something. The main reason I can think of for preferring “system” is if we don’t want to change the valid characters for a value in this file. Then you can have system(Debian) and system(Conda) and everything will work just fine.
The benefit of doing this, is that with a special value in that file that says "this file belongs to the OS", then pip could start looking for that file and require a --force flag before it modifies any files belonging to that project. Then distributors like Debian, Fedora, etc could simply write out the INSTALLER file with the correct value, and pip would start to respect their files by default.
I'd like a little more clarity on exactly what circumstances justify setting this flag. If I write a new python package manager, then should I set this flag on all my packages because I don't trust anyone else to get things right? :-)
Maybe the relevant thing is what I said above, that there is some system tracking these files that is not using the dist-info directory as its source-of-truth about what's installed, dependencies, etc.
-n
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Fri, Jan 22, 2016 at 11:28 AM, Donald Stufft
On Jan 22, 2016, at 1:46 PM, Nathaniel Smith
wrote: On Jan 22, 2016 10:11 AM, "Donald Stufft"
wrote: PEP 376 added a file to the .dist-info directory called "INSTALLER"
supposed to be:
This option is the name of the tool used to invoke the installation.
However, nothing has really ever implemented it and it's gone largely ignored until just recently pip 8.0 started writing the INSTALLER file into the metadata directories with a value of "pip".
I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I
which was think
since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value.
I think we want more than just "system", because the same user could have some packages managed by dpkg and some by conda, both of which have their own dependency resolution mechanisms that are outside pip's and could get broken if pip removes stuff willy-nilly. And when pip errors out, you want to be able to explain to the user "this package is managed by conda, and using pip on it may break your conda setup..." versus "this package is managed by Debian, and using pip on it may break your Debian setup...".
(Actually I'm not sure what the status these days is of mixing pip and conda -- they've gotten somewhat better at handling it. Is the proposed behavior in pip when it sees this flag something that distribution maintainers have asked for? Are they present in this thread?)
Yea, that’s why I thought about dpkg (system) or system(Debian) or something. The main reason I can think of for preferring “system” is if we don’t want to change the valid characters for a value in this file. Then you can have system(Debian) and system(Conda) and everything will work just fine.
Maybe to simplify the discussion we should just forget about INSTALLER, leave it in peace to be what the PEP says ("fyi this is the last program to touch this, in case anyone cares"), and add a new file with whatever syntax/semantics make sense. Filenames are cheap and plentiful, and no reason to complicate this discussion with hypothetical backcompat worries. -n -- Nathaniel J. Smith -- https://vorpus.org http://vorpus.org
On 23 January 2016 at 08:17, Nathaniel Smith
On Fri, Jan 22, 2016 at 11:28 AM, Donald Stufft
wrote: Yea, that’s why I thought about dpkg (system) or system(Debian) or something. The main reason I can think of for preferring “system” is if we don’t want to change the valid characters for a value in this file. Then you can have system(Debian) and system(Conda) and everything will work just fine.
Maybe to simplify the discussion we should just forget about INSTALLER, leave it in peace to be what the PEP says ("fyi this is the last program to touch this, in case anyone cares"), and add a new file with whatever syntax/semantics make sense. Filenames are cheap and plentiful, and no reason to complicate this discussion with hypothetical backcompat worries.
Right, that was my theory in implementing INSTALLER just as PEP 376 defined it - having pip write "pip" to that file is enough to let us determine: 1. Controlled in a pip-compatible way 2. Controlled in a pip-incompatible way 3. pip compatibility not specified Paul's right that at the time the assumption was each tool would just write *its own* name into the file, but given the other changes that have happened since then, it now makes sense that other tools that are fully interoperable with another more-widely used installer may choose to write that installer's name instead of their own, and the only states we *really* care about from a UX perspective are "definitely compatible", "definitely incompatible" and "don't know". We may decide to do something more sophisticated later, but that would be as part of someone finding the roundtuits to actually design a plugin system for delegating to system package managers when running as root.
From a downstream perspective, the main thing we need to know in order to choose a suitable value to put into that file is how pip decides to use whatever we write there. For example, if pip were to emit a warning message looking something like:
"<project> is not managed by pip, skipping (use '<INSTALLER content>' or '--force' to update it)" Then the distro could update our packaging policies such that we write our respective package management command names into the INSTALLER files when creating system packages. (As far as the regex defining the permitted content goes, appropriately caveating PEP 376 to better match the reality of how current generation tools work is one of my main motivations for revising the specification process) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 22 January 2016 at 18:46, Nathaniel Smith
I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I think since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value.
I think we want more than just "system", because the same user could have some packages managed by dpkg and some by conda, both of which have their own dependency resolution mechanisms that are outside pip's and could get broken if pip removes stuff willy-nilly. And when pip errors out, you want to be able to explain to the user "this package is managed by conda, and using pip on it may break your conda setup..." versus "this package is managed by Debian, and using pip on it may break your Debian setup...".
Well, I would expect conda to be specifying "conda" in the INSTALLER file. That's what the file is intended for, surely? To record what tool installed the package? Equally, there seems to me to be no reason dpkg couldn't use "dpkg", and yum use "yum", etc. I just see Donald's suggestion of using "system" as a simple flag saying "the OS" in a generic way for distributions that don't (for some reason) *want* to name the specific installer in the INSTALLER file. Well, that and the fact that they could have done that for years now, so being able to say "just put "system" in there and be done with it please!" is a nice easy message to give to distributions. Honestly, this seems to me to be a bit of a non-discussion. My recollection of PEP 376 and the discussions around INSTALLER are that it was simply a place where install tools could say "hey, it was me that did this". It never got used largely because nobody seemed to care about having that information. Now pip is starting to care, but as far as I can see there are only 3 cases pip should care about: 1. The file says "pip". Do just as we do now, we control that package. 2. The file says something *other* than pip. We don't own the files, take appropriate action. Warn, require an extra flag, whatever - we can use what's in the file to say "files owned by FOO" but other than that there's no reason we should behave differently depending on what specific non-pip value we see. 3. There's no INSTALLER file. This is the awkward one, as we can't (yet) say that this means the file isn't owned by pip. In a few years maybe, but for now we have to assume it could be either of the above cases. So the message should be "if you want pip to be careful with your files, put something (probably your installer name, but in practice anything other than "pip") in INSTALLER". Longer term, other tools might taking the same approach as pip, at which point not using the same name as other tools becomes useful - at that point, a generic term like "system" could be a bad choice... Paul
On Jan 22, 2016, at 3:38 PM, Paul Moore
wrote: On 22 January 2016 at 18:46, Nathaniel Smith
wrote: I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I think since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value.
I think we want more than just "system", because the same user could have some packages managed by dpkg and some by conda, both of which have their own dependency resolution mechanisms that are outside pip's and could get broken if pip removes stuff willy-nilly. And when pip errors out, you want to be able to explain to the user "this package is managed by conda, and using pip on it may break your conda setup..." versus "this package is managed by Debian, and using pip on it may break your Debian setup...".
Well, I would expect conda to be specifying "conda" in the INSTALLER file. That's what the file is intended for, surely? To record what tool installed the package?
Equally, there seems to me to be no reason dpkg couldn't use "dpkg", and yum use "yum", etc. I just see Donald's suggestion of using "system" as a simple flag saying "the OS" in a generic way for distributions that don't (for some reason) *want* to name the specific installer in the INSTALLER file. Well, that and the fact that they could have done that for years now, so being able to say "just put "system" in there and be done with it please!" is a nice easy message to give to distributions.
Honestly, this seems to me to be a bit of a non-discussion.
My recollection of PEP 376 and the discussions around INSTALLER are that it was simply a place where install tools could say "hey, it was me that did this". It never got used largely because nobody seemed to care about having that information. Now pip is starting to care, but as far as I can see there are only 3 cases pip should care about:
1. The file says "pip". Do just as we do now, we control that package. 2. The file says something *other* than pip. We don't own the files, take appropriate action. Warn, require an extra flag, whatever - we can use what's in the file to say "files owned by FOO" but other than that there's no reason we should behave differently depending on what specific non-pip value we see. 3. There's no INSTALLER file. This is the awkward one, as we can't (yet) say that this means the file isn't owned by pip. In a few years maybe, but for now we have to assume it could be either of the above cases.
So the message should be "if you want pip to be careful with your files, put something (probably your installer name, but in practice anything other than "pip") in INSTALLER".
Longer term, other tools might taking the same approach as pip, at which point not using the same name as other tools becomes useful - at that point, a generic term like "system" could be a bad choice... Paul
Hmm, in my head the three cases were more like: 1) The installed project is managed by a Python level tool which uses the Python metadata as it’s database. Thus if you’re this kind of tool too, then you can muck around here because things will be fine. 2) The installed project is managed by something that isn’t a Python level tool, and it has it’s own database and the .(egg|dist)-info files are not understood by it. 3) We don’t know what is managing the project. The first would be things like, pip, easy_install, and distil. These can all easily interopt with each other because they’re all going to lay down .egg-info or .dist-info files and read those same files. The second is things like Conda, dpkg, yum, etc which are going to treat .egg-info or .dist-info files as just another opaque file that is included in it’s package that it has to lay down. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 22 January 2016 at 20:45, Donald Stufft
Hmm, in my head the three cases were more like:
1) The installed project is managed by a Python level tool which uses the Python metadata as it’s database. Thus if you’re this kind of tool too, then you can muck around here because things will be fine. 2) The installed project is managed by something that isn’t a Python level tool, and it has it’s own database and the .(egg|dist)-info files are not understood by it. 3) We don’t know what is managing the project.
The first would be things like, pip, easy_install, and distil. These can all easily interopt with each other because they’re all going to lay down .egg-info or .dist-info files and read those same files. The second is things like Conda, dpkg, yum, etc which are going to treat .egg-info or .dist-info files as just another opaque file that is included in it’s package that it has to lay down.
Good point - maybe that's how it *should* have been. My recollection of the discussions at the time was "put your tool name in here" and nobody really thought about the possibility of more than one tool co-operating. This was in the bad old days of "my tool is better than yours" flamewars :-( Switching to something like that model probably does need a PEP. But INSTALLER doesn't seem the right concept for that - it's more like wanting a file that defines how the project metadata is stored - "Metadata 2.0" or "OS" or "Conda" or whatever. Paul
On 23 January 2016 at 07:10, Donald Stufft
PEP 376 added a file to the .dist-info directory called "INSTALLER" which was supposed to be:
This option is the name of the tool used to invoke the installation.
However, nothing has really ever implemented it and it's gone largely ignored until just recently pip 8.0 started writing the INSTALLER file into the metadata directories with a value of "pip".
I'd like to propose adding a special cased value to add to the installer file that will tell projects like pip that this particular installed thing is being managed by someone else, and we should keep our hands off of it. According to PEP 376 the supported values for this file are r"[a-z0-9_-.]", however I think since nobody has ever implemented it, we could expand that so that it so you can also have a special value, of "dpkg (system)" or maybe that's not worth it and we could just have "system" as a special value.
The benefit of doing this, is that with a special value in that file that says "this file belongs to the OS", then pip could start looking for that file and require a --force flag before it modifies any files belonging to that project. Then distributors like Debian, Fedora, etc could simply write out the INSTALLER file with the correct value, and pip would start to respect their files by default.
Thoughts? Does this sound reasonable to everyone? Do we think it needs a new PEP or can we just amend PEP 376 to add this extra bit of information?
I think asking other distros to export the installing information to
us is a good thing.
I think requiring a force flag to replace files installed by another
packaging tool is an unbreakme option as things stand, and so I'm very
concerned about the resulting user experience.
If someone runs 'sudo pip install -U pip' they are already signalling
precisely what they want, and as a user I very much doubt that:
"Refusing to uninstall pip because it was not installed by pip"
will make even 5% of them pause longer than required to look up the
--force-alternative-installer flag (or whatever it is) and pass it in.
The only reason we're touching these other installer installed files
is because there is no location on the Python and exec paths where we
can install the new pip without overwriting the one dpkg (or conda or
whatever) installed. If there was such a place we could just write
(and upgrade things) there and not have to deal with shadowing, or
ever uninstalling dpkg etc installed files. In the absence of such a
place, removing the other-install-system's installed files is the
right thing to do when a user asks us to, without needing an
additional option.
We could add an option folk could set to turn *on* such a safety belt
- but I don't know who it would /really/ be helping. (I'm fairly sure
I know the places folk think it would help, I just don't agree that it
would :)).
Having a interface to the system package manager as has been
previously mooted - and I'm going to get back to that - might help a
little, but uninstalls are quite different to installs. Uninstalls get
blocked by things like 'app-foo requires requests', and I would be
super suprised (and delighted!) if we were able to override that when
upgrading requests via pip :)
-Rob
--
Robert Collins
On Jan 22, 2016, at 2:09 PM, Robert Collins
wrote: I think requiring a force flag to replace files installed by another packaging tool is an unbreakme option as things stand, and so I'm very concerned about the resulting user experience.
I don't think the current behavior makes *any* sense and represents nothing but a footgun where you end up with two systems fighting over what the content of a particular file is. Currently there are two sort of scenarios where something like this comes up: Debuntu ------- Current behavior is we'll uninstall the file sitting at /usr/lib/pythonX.Y/dist-packages/* and then we'll go ahead and install our own files into /usr/local/lib/pythonX.Y/dist-packages/*. Then, the next time that package gets an update the files are put back into place by dpkg into /usr/lib/pythonX.Y/dist-packages/* and we're finally at a consistent state where both package managers know the state of their respective systems. In the "you need a flag" future world, we would just skip uninstalling the files from /usr/lib/pythonX.Y/dist-packages/* and just install our own files into /usr/local/lib/pythonX.Y/dist-packages. Thus we never touch anything that isn't owned by us and Debian's internal systems stay consistent. If someone did only ``pip uninstall requests`` and that was located in the OS files, then we would refuse to uninstall it, directing them to the platform tooling instead. Everyone Else ------------- The /usr/lib and /usr/local/lib split is Debian specific (though at some point I want to bring it to upstream Python) so on other systems you have a single location, typically in /usr/lib/.../site-packages where both the OS and pip will be trying to modify files. Right now this is pure chaos, if python-requests is installed by the system, pip will come along and uninstall it then upgrade requests, then the next time an update is released, the system tooling will then come along and write all over the files that pip dropped into place, possibly corrupting things depending on how their patching works. Even for just ``pip uninstall foo``, it's really only uninstall until the next time that package gets updated. In the "you need a flag future", if python-requests is installed by the OS and it's in the same location that *we* want to write a file to, then we'll error out and require a --force flag. If it isn't already installed and nothing is in our way, then we'll go ahead and install to that location. This isn't as nice as the Debuntu world, but nothing will be without the patch that Debian has. Essentially though, ``pip install -U anything-installed-by-the-os`` is at best a no-op that deletes files for no good reason (Debuntu) or it's a field full of footguns where you're going to have two independent systems shitting all over each others files trying to do what they think the user wants. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 2016-01-23 08:09:47 +1300 (+1300), Robert Collins wrote: [...]
Having a interface to the system package manager as has been previously mooted - and I'm going to get back to that - might help a little, but uninstalls are quite different to installs. Uninstalls get blocked by things like 'app-foo requires requests', and I would be super suprised (and delighted!) if we were able to override that when upgrading requests via pip :)
Add to this the fact that whatever system-distributed files pip removes/replaces will get quietly overwritten again the next time you update your system packages and there's a new version of whatever you've replaced... perhaps being able to set a hold state through the distro package manager from pip at that point for whatever package owned the files it replaced would be a helpful middle ground (I know how I'd go about it on Debian derivatives, less so on all those "other" distros). -- Jeremy Stanley
participants (8)
-
Donald Stufft
-
Fred Drake
-
Ian Cordasco
-
Jeremy Stanley
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Robert Collins