Hi all, Is anyone working on pruning old packages from pypi? I found something last updated in 2014, which, looking at the source appears half-done. Github link doesn't work any longer, no description, etc. I managed to find author's email address out of band, and he responded that he can't remember the password, yada yada. I wonder if some basic automation is possible here -- check if url's are reachable and if existing package satisfies basic requirements, failing that mark it as "possibly out of date" d.
On Jul 12, 2016, at 4:55 AM, Dima Tisnek
wrote: Hi all,
Is anyone working on pruning old packages from pypi?
I found something last updated in 2014, which, looking at the source appears half-done. Github link doesn't work any longer, no description, etc.
I managed to find author's email address out of band, and he responded that he can't remember the password, yada yada.
I wonder if some basic automation is possible here -- check if url's are reachable and if existing package satisfies basic requirements, failing that mark it as "possibly out of date"
My feeling is that there should be a "dead man's switch" sort of mechanism for this. Require manual intervention from at least one package owner at least once a year. I believe if you dig around in the archives there's been quite a bit of discussion around messaging to package owners and that sort of thing - and the main sticking point is that someone needs to volunteer to do the work on Warehouse. Are you that person? :) -glyph
On Jul 12, 2016, at 4:45 PM, Glyph Lefkowitz
wrote: My feeling is that there should be a "dead man's switch" sort of mechanism for this. Require manual intervention from at least one package owner at least once a year. I believe if you dig around in the archives there's been quite a bit of discussion around messaging to package owners and that sort of thing - and the main sticking point is that someone needs to volunteer to do the work on Warehouse. Are you that person? :)
I suspect any change like this will require some sort of PEP or something similar to it. It’s something that I think is going to hard to get just right (if it’s something we want to do at all). Software can be “finished” without needing more releases, and sometimes projects stop getting updates until the maintainer has more time (or a new maintainer comes along). An example is setuptools which had no releases between Oct 2009 and Jun 2013. Another nice example is ``wincertstore`` which has had two releases one in 2013 and one in 2014 and is one of the most downloaded projects on PyPI. It doesn’t need any updates because it’s just a wrapper around Windows APIs via ctypes. Another thing we need to be careful about is what do we do once said dead man’s switch triggers? We can’t just release the package to allow anyone to register it, that’s just pointing a security shaped footgun at the foot of every person using that project? It doesn’t make sense to block new uploads for that project since there’s no point to disallowing new uploads. Flagging it to allow someone to “take over” (possibly with some sort of review) has some of the security shaped footguns as well as a problem with deciding who to trust with a name or not. — Donald Stufft
On Tue, 12 Jul 2016 at 21:54 Donald Stufft
On Jul 12, 2016, at 4:45 PM, Glyph Lefkowitz
wrote: My feeling is that there should be a "dead man's switch" sort of mechanism for this. Require manual intervention from at least one package owner at least once a year. I believe if you dig around in the archives there's been quite a bit of discussion around messaging to package owners and that sort of thing - and the main sticking point is that someone needs to volunteer to do the work on Warehouse. Are you that person? :)
[SNIP]
Another thing we need to be careful about is what do we do once said dead man’s switch triggers? We can’t just release the package to allow anyone to register it, that’s just pointing a security shaped footgun at the foot of every person using that project? It doesn’t make sense to block new uploads for that project since there’s no point to disallowing new uploads. Flagging it to allow someone to “take over” (possibly with some sort of review) has some of the security shaped footguns as well as a problem with deciding who to trust with a name or not.
My assumption was that if a project was flagged as no longer maintained, then it would literally just get some clear banner/label/whatever to let people know that if they start using the project that they shouldn't necessarily expect bug-fixes. And if people wanted to get really fancy, expose this metadata such that some tool could easily warn you that you have dependencies that have been flagged as unsupported code.
Well said.
IMO, package names shouldn't be reused. Also, IMO, we have a
namespace problem, for which there's a common solution that we avoid
(domain based names, which can also be reused, but ...).
OTOH, here's an idea. What if in addition to the project name, we also
assigned a unique id. When a package was added to a consuming project,
we'd store both the packages name, and it's project id. When looking
up a package, we'd supply both the project name and id. If a name was
reused, the new project with the same name would have a new project id
and wouldn't be confused with the old one. We could even still server
the old project's released without advertising them. This way, if we
did decide to reuse a name, we could do so pretty safely.
Jim
On Wed, Jul 13, 2016 at 12:54 AM, Donald Stufft
On Jul 12, 2016, at 4:45 PM, Glyph Lefkowitz
wrote: My feeling is that there should be a "dead man's switch" sort of mechanism for this. Require manual intervention from at least one package owner at least once a year. I believe if you dig around in the archives there's been quite a bit of discussion around messaging to package owners and that sort of thing - and the main sticking point is that someone needs to volunteer to do the work on Warehouse. Are you that person? :)
I suspect any change like this will require some sort of PEP or something similar to it. It’s something that I think is going to hard to get just right (if it’s something we want to do at all).
Software can be “finished” without needing more releases, and sometimes projects stop getting updates until the maintainer has more time (or a new maintainer comes along). An example is setuptools which had no releases between Oct 2009 and Jun 2013. Another nice example is ``wincertstore`` which has had two releases one in 2013 and one in 2014 and is one of the most downloaded projects on PyPI. It doesn’t need any updates because it’s just a wrapper around Windows APIs via ctypes.
Another thing we need to be careful about is what do we do once said dead man’s switch triggers? We can’t just release the package to allow anyone to register it, that’s just pointing a security shaped footgun at the foot of every person using that project? It doesn’t make sense to block new uploads for that project since there’s no point to disallowing new uploads. Flagging it to allow someone to “take over” (possibly with some sort of review) has some of the security shaped footguns as well as a problem with deciding who to trust with a name or not.
— Donald Stufft
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
-- Jim Fulton http://jimfulton.info
On Jul 12, 2016, at 9:54 PM, Donald Stufft
wrote: On Jul 12, 2016, at 4:45 PM, Glyph Lefkowitz
wrote: My feeling is that there should be a "dead man's switch" sort of mechanism for this. Require manual intervention from at least one package owner at least once a year. I believe if you dig around in the archives there's been quite a bit of discussion around messaging to package owners and that sort of thing - and the main sticking point is that someone needs to volunteer to do the work on Warehouse. Are you that person? :)
I suspect any change like this will require some sort of PEP or something similar to it. It’s something that I think is going to hard to get just right (if it’s something we want to do at all).
Software can be “finished” without needing more releases,
"The software isn't finished until the last user is dead." :-)
and sometimes projects stop getting updates until the maintainer has more time (or a new maintainer comes along).
Yes; the whole point here is to have some way for people to know that a new maintainer is needed.
An example is setuptools which had no releases between Oct 2009 and Jun 2013.
Arguably setuptools _was_ badly broken though, and if it had been obvious earlier on that it was in a bad situation perhaps we'd be further along by now :-).
Another nice example is ``wincertstore`` which has had two releases one in 2013 and one in 2014 and is one of the most downloaded projects on PyPI. It doesn’t need any updates because it’s just a wrapper around Windows APIs via ctypes.
Except it does need testing against new versions of Python. No Python :: 3.5 classifier on it, for example! And right at the top of its description, a security fix. The point of such a switch is to be able to push it and respond; not to tell the maintainer "you have to do a new release!" but rather to prompt the maintainer to explicitly acknowledge "the reason I have not done a new release is not that I haven't been paying attention; I am alive, I'm paying attention, and we don't need any maintenance, someone is still watching".
Another thing we need to be careful about is what do we do once said dead man’s switch triggers? We can’t just release the package to allow anyone to register it, that’s just pointing a security shaped footgun at the foot of every person using that project? It doesn’t make sense to block new uploads for that project since there’s no point to disallowing new uploads. Flagging it to allow someone to “take over” (possibly with some sort of review) has some of the security shaped footguns as well as a problem with deciding who to trust with a name or not.
The primary thing would be to have a banner on the page and a warning from `pip install´. Those of us close to the heart of the Python community already have various ways of reading the tea leaves to know that things are likely to be unmaintained or bitrotting; the main purpose of such a feature would be to have an automated way for people who don't personally know all the prominent package authors and see them at conferences and meetups all the time to get this information. For example: nobody should be using PIL, they should be using pillow. Yet there's no way for a new user to figure this out by just looking at https://pypi.io/project/PIL/ :). I think that the adjudication process for stealing a name from an existing owner is something that still bears discussion, but separately. Whatever that process is, you'd have to go through it fully after a package becomes thusly "abandoned", and for the reasons you cite, it absolutely should not be automated. Perhaps it shouldn't even be the way to deal with it - maybe the most you should be able to do in this case is to expand the "this is unmaintained" warning with a pointer to a different replacement name. -glyph
On 13Jul2016 1252, Glyph Lefkowitz wrote:
The primary thing would be to have a banner on the page and a warning from `pip install´. Those of us close to the heart of the Python community already have various ways of reading the tea leaves to know that things are likely to be unmaintained or bitrotting; the main purpose of such a feature would be to have an automated way for people who /don't/ personally know all the prominent package authors and see them at conferences and meetups all the time to get this information. For example: nobody should be using PIL, they should be using pillow. Yet there's no way for a new user to figure this out by just looking at https://pypi.io/project/PIL/ :).
I think that the adjudication process for stealing a name from an existing owner is something that still bears discussion, but separately. Whatever that process is, you'd have to go through it fully after a package becomes thusly "abandoned", and for the reasons you cite, it absolutely should not be automated. Perhaps it shouldn't even be the way to deal with it - maybe the most you should be able to do in this case is to expand the "this is unmaintained" warning with a pointer to a different replacement name.
I like this. Maybe if a maintainer doesn't trigger the switch/publish anything for a year, a banner appears on the page with a publicly editable (votable?) list of alternative packages - thinking something similar to a reviews system with an "I found this review helpful" button. Possibly such user-contributed content would be valuable anyway, but the "probably abandoned" state just moves it to the top of the page instead of the bottom. Cheers, Steve
On Jul 13, 2016, at 1:54 PM, Steve Dower
wrote: Possibly such user-contributed content would be valuable anyway
https://alternativeto.net but for PyPI? :) -glyph
On 13Jul2016 1456, Glyph Lefkowitz wrote:
On Jul 13, 2016, at 1:54 PM, Steve Dower
mailto:steve.dower@python.org> wrote: Possibly such user-contributed content would be valuable anyway
https://alternativeto.net but for PyPI? :)
Or just more general reviews/warnings/info. "Doesn't work with IronPython", "Works fine on 3.5 even though it doesn't say so", etc. Restrict it to 140 chars, signed in users, only allow linking to other PyPI packages, let the maintainer delete comments (or mark them as disputed) and I think you'd avoid abuse (or rants/detailed bug reports/etc.). Maybe automatically clear all comments on each new release as well. Doesn't have to be complicated and fancy - just enough that users can help each other when maintainers disappear. Cheers, Steve
On Tue, Jul 12, 2016 at 7:55 AM, Dima Tisnek
Hi all,
Is anyone working on pruning old packages from pypi?
I found something last updated in 2014, which, looking at the source appears half-done. Github link doesn't work any longer, no description, etc.
I managed to find author's email address out of band, and he responded that he can't remember the password, yada yada.
I wonder if some basic automation is possible here -- check if url's are reachable and if existing package satisfies basic requirements, failing that mark it as "possibly out of date"
I'm curious why you view this as a problem that needs to be solved? - Do you want to take over the name yourself? - Are you afraid someone will stumble on this package and use it? Something else? Jim -- Jim Fulton http://jimfulton.info
I came across a package by accident.
A mate made a reasonable mistake typing in a pip command, and
something odd got installed.
For a moment I even suspected that package in question was some kind
of malware, so I went to download it manually (not via pip install),
and realised that the package was not updated for a long while, didn't
have description and github link was broken.
That overall got me thinking about namespace pollution in pip, that
once something is pushed in, it's like to stay there forever. I
figured, with so many packages in pypi, what's the percentage that
cannot be removed by author for a simple reason that, in the worst
case, author is dead?
Btw., I'm not a fan of domain names (to finicky, change more often
that short names) or unique ids (humans don't handle them well).
I'd rather see something similar to Linux distributions where there's
a curated repository "core" and a few semi-official, like "extra" and
"community," and for some, "testing."
A name foobar resolves to core/foobar-<latest> if that exists, and if
not some subset of other repositories is used.
This way, an outdated package can be moved to another repo without
breaking install base.
In fact, curation without namespaces will already be pretty good.
d.
On 13 July 2016 at 19:24, Jim Fulton
On Tue, Jul 12, 2016 at 7:55 AM, Dima Tisnek
wrote: Hi all,
Is anyone working on pruning old packages from pypi?
I found something last updated in 2014, which, looking at the source appears half-done. Github link doesn't work any longer, no description, etc.
I managed to find author's email address out of band, and he responded that he can't remember the password, yada yada.
I wonder if some basic automation is possible here -- check if url's are reachable and if existing package satisfies basic requirements, failing that mark it as "possibly out of date"
I'm curious why you view this as a problem that needs to be solved?
- Do you want to take over the name yourself?
- Are you afraid someone will stumble on this package and use it?
Something else?
Jim
-- Jim Fulton http://jimfulton.info
On 13 July 2016 at 19:08, Dima Tisnek
I'd rather see something similar to Linux distributions where there's a curated repository "core" and a few semi-official, like "extra" and "community," and for some, "testing." A name foobar resolves to core/foobar-<latest> if that exists, and if not some subset of other repositories is used. This way, an outdated package can be moved to another repo without breaking install base.
In fact, curation without namespaces will already be pretty good.
Who would take on the work of maintaining a curated repository? To do anything like a reasonable job would be a *lot* of work. Paul
On 14 July 2016 at 05:04, Paul Moore
On 13 July 2016 at 19:08, Dima Tisnek
wrote: I'd rather see something similar to Linux distributions where there's a curated repository "core" and a few semi-official, like "extra" and "community," and for some, "testing." A name foobar resolves to core/foobar-<latest> if that exists, and if not some subset of other repositories is used. This way, an outdated package can be moved to another repo without breaking install base.
In fact, curation without namespaces will already be pretty good.
Who would take on the work of maintaining a curated repository? To do anything like a reasonable job would be a *lot* of work.
Work that Linux distros already do, hence ideas like https://fedoraproject.org/wiki/Env_and_Stacks/Projects/LanguageSpecificRepos... and https://fedoraproject.org/wiki/Env_and_Stacks/Projects/PackageReviewProcessR... Convincing Linux distros to change their review processes to be less precious about package formats is a long political battle to fight, but it's more sustainable in the long run than trying to build *new* curation focused communities for each different language ecosystem that pops up :) Cheers, Nick. P.S. The conda community would be another example of a collaborative project curation effort, albeit one closer to the traditional Linux distro model (where review is accompanied by a change in packaging format) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 13, 2016, at 2:08 PM, Dima Tisnek
wrote: I'd rather see something similar to Linux distributions where there's a curated repository "core" and a few semi-official, like "extra" and "community," and for some, "testing." A name foobar resolves to core/foobar-<latest> if that exists, and if not some subset of other repositories is used. This way, an outdated package can be moved to another repo without breaking install base.
PyPI is unlikely to *ever* become a curated repository. The time and effort it would take to do that, even if we decided we wanted to, is not something that we have available to us. Beyond that though, I think that would fundamentally change PyPI in a way that is for the worse. One of the goals of PyPI is to enable anyone to publish a package, whether they’re as well known and trusted as Guido, or some unknown person from the backwoods of Pennsylvania. We try very hard to remain neutral in terms of whether one package is “better” than another package and try to present largely unbiased information [1]. It would not be particularly hard, technically speaking, for someone to maintain a curated set of packages ontop of what PyPI provides already. This would not need to be an official PyPI thing, but if one rose to some prominence it would be easy enough to direct folks to it who want that sort of thing. [1] To the extent that any information at all is unbiased. — Donald Stufft
Getting to this thread late, but it didn't seem that was resolved in the least, so I'll as my $0.02
That overall got me thinking about namespace pollution in pip, that once something is pushed in, it's like to stay there forever.
This REALLY is a problem, and one that will only get worse. It would be nice to work out an approach before it gets too bad.
I'd rather see something similar to Linux distributions where there's a curated repository "core"
As pointed out, that's not going to happen.
and a few semi-official, like "extra" and "community," and for some, "testing."
But namespaces like these could still be implemented without curation. Right now, the barrier to entry to putting a package up on PyPI is very, very low. There are a lot of 'toy', or 'experimental', or 'well intentioned but abandoned' projects on there. If there was a clear and obvious place for folks to put these packages while they work out the kinks, and determine whether the package is really going to fly, I think people would use it. And we could have mostly-automatic policies about seemingly abandoned projects as well -- moving them to the "unmaintained" channel, or.... As for the 'deadman's switch' idea -- that's a great idea. Sure, there are projects with no active maintainer that people rely on, but almost be definition, if folks care about it, we'd be able to find someone willing to put their finger on the dads man's button once a year.... As for issues like setuptools and PIL, where there is a shift in maintenance if a large, high profile package, we really need a benevolent oligarchy for PyPA that can resolve these issues by hand. As pointed out -- this really doesn't come up often. However, in these discussions, I've observed a common theme: folks in the community bring up issues about unmaintained packages, namespace pollution, etc. the core PyPA folks respond with generally well reasoned arguments why proposed solutions won't fly. But it's totally unclear to me whether the core devs don't think these are problems worth addressing, or think they can only be addresses with major effort that no one has time for. If the core devs think it's fine and dandy like it is, we can all stop talking about it. By the way, there is an experiment underway with a "curated" community package repository for conda packages: https://conda-forge.github.io It's semi-curated, in the sense that only the core devs can approve a package being added, but once added, anyone can maintain it. It's going very well, but I'm not sure how it's going to scale. So far, 900 or so packages, 80 or so maintainers. Which I think is very impressive for a young system, but still a lot smaller than it could be. But I think PyPA should keep an eye on it -- I'm sure there will be lessons learned. -CHB
A name foobar resolves to core/foobar-<latest> if that exists, and if not some subset of other repositories is used. This way, an outdated package can be moved to another repo without breaking install base.
In fact, curation without namespaces will already be pretty good.
d.
On 13 July 2016 at 19:24, Jim Fulton
wrote: On Tue, Jul 12, 2016 at 7:55 AM, Dima Tisnek
wrote: Hi all, Is anyone working on pruning old packages from pypi?
I found something last updated in 2014, which, looking at the source appears half-done. Github link doesn't work any longer, no description, etc.
I managed to find author's email address out of band, and he responded that he can't remember the password, yada yada.
I wonder if some basic automation is possible here -- check if url's are reachable and if existing package satisfies basic requirements, failing that mark it as "possibly out of date"
I'm curious why you view this as a problem that needs to be solved?
- Do you want to take over the name yourself?
- Are you afraid someone will stumble on this package and use it?
Something else?
Jim
-- Jim Fulton http://jimfulton.info
Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 22 July 2016 at 16:47, Chris Barker - NOAA Federal
But it's totally unclear to me whether the core devs don't think these are problems worth addressing, or think they can only be addresses with major effort that no one has time for.
Speaking for myself, it's the latter. I think there are issues worth resolving, but the implementation would need more effort than we have available. Also, there's a lot of sensitivity in this area. Any solution is almost certain to ruffle some feathers, so a lot of the effort needed could well be in the form of diplomatically managing project maintainers' expectations/views. Which is pretty draining, and the people doing the work are likely to have a high burnout rate. (In my opinion). Paul
On Jul 22, 2016, at 11:47 AM, Chris Barker - NOAA Federal
wrote: If the core devs think it's fine and dandy like it is, we can all stop talking about it.
I think they’re certainly a problem. The current solutions that have been proposed have their own problems of course, and problems enough that I don’t feel comfortable implementing them. Personally I don’t currently have the time to work on a better solution but if someone did that’d be fine with me. I would mention though that it’s possible there *is* no solution to this problem that doesn’t bring with it it’s own problems that are worse then the problem at hand. I’m not saying that’s the case, but just mentioning that it may be so.
By the way, there is an experiment underway with a "curated" community package repository for conda packages:
It's semi-curated, in the sense that only the core devs can approve a package being added, but once added, anyone can maintain it.
It's going very well, but I'm not sure how it's going to scale. So far, 900 or so packages, 80 or so maintainers. Which I think is very impressive for a young system, but still a lot smaller than it could be.
But I think PyPA should keep an eye on it -- I'm sure there will be lessons learned.
conda-forge and PyPI are not really similar. For conda-forge they’re largely just repackaging other software for the conda system. This makes it function better when you have just anyone of the core team able to work on stuff, because all they’re doing to adjusting the build, upgrading versions etc. They’re not working on the projects themselves. Curated also goes against the sort of “spirit” of PyPI. It’s a place where anyone can release something and immediately be able to be part of the ecosystem. Adding curation on top of that would change the nature of PyPI, maybe for the better or maybe for the worst, but it would fundamentally change PyPI. — Donald Stufft
On Jul 22, 2016, at 11:47 AM, Chris Barker - NOAA Federal
wrote: If the core devs think it's fine and dandy like it is, we can all stop talking about it.
I think they’re certainly a problem. The current solutions that have been proposed have their own problems of course, and problems enough that I don’t feel comfortable implementing them. Personally I don’t currently have the time to work on a better solution but if someone did that’d be fine with me. I would mention though that it’s possible there *is* no solution to this problem that doesn’t bring with it it’s own problems that are worse then the problem at hand. I’m not saying that’s the case, but just mentioning that it may be so.
By the way, there is an experiment underway with a "curated" community package repository for conda packages:
It's semi-curated, in the sense that only the core devs can approve a package being added, but once added, anyone can maintain it.
It's going very well, but I'm not sure how it's going to scale. So far, 900 or so packages, 80 or so maintainers. Which I think is very impressive for a young system, but still a lot smaller than it could be.
But I think PyPA should keep an eye on it -- I'm sure there will be lessons learned.
conda-forge and PyPI are not really similar. For conda-forge they’re largely just repackaging other software for the conda system. This makes it function better when you have just anyone of the core team able to work on stuff, because all they’re doing to adjusting the build, upgrading versions etc. They’re not working on the projects themselves. Curated also goes against the sort of “spirit” of PyPI. It’s a place where anyone can release something and immediately be able to be part of the ecosystem. Adding curation on top of that would change the nature of PyPI, maybe for the better or maybe for the worst, but it would fundamentally change PyPI. — Donald Stufft
We've been discussing here at least two different problems related to
package maintainership:
1. Abandoned/no-longer-maintained, but previously useful packages
2. namespace and package idea space pollution due to tests/aborted
attempts/packaginginexperience.
I don't have a good idea about 1, and it's clear from this discussion that
curation as a solution for 2 is currently non-workable/undesirable due to
both lack of manpower and a desire not to discourage contributions.
Even if curation is not involved, we also don't want to up the barrier to
entry such that we discourage new entrants into the ecosystem.
A level of namespacing on the project name (but not (necessarily) on the
python module/package level) was suggested as a way to reduce 2.
I just thought of another, orthogonal, suggestion for 2:
There is a TestPyPI server but it feels like few people use it (I certainly
never did).
What if you were only allowed to push a new project onto PyPI proper if you
already have similarly named project on TestPyPI and at least one release
of that project there?
Notice that I'm not suggesting that all files need to touch TestPyPI before
going to PyPI, although that could be considered in the future.
I'm also not suggesting a waiting period between publishing on TestPyPI and
PyPI, but this could also be considered if we see people abusing the fact
that dumping some garbage on TestPyPI is the only step necessary before
dumping some garbage on PyPI.
And we could make it so TestPyPI is a bit more flexible with files than
PyPI itself. For instance we could allow deletion and re-upload of files
with the same name there that is currently disallowed on PyPI so people can
test-drive their releases before pushing files onto PyPI proper (don't
know if this is already the case).
"Hey!" I hear you saying "If we're just moving the namespace reservation
from PyPI to TestPyPI" aren't we just moving problem number 2 around
without really solving it?"
Well, if someone does some test on TestPyPI and later abandons it, we don't
need to have as many qualms about removing the project than we do with a
project on PyPI proper.
We could even have an official grace period, like a few months, before we
consider that a project on TestPyPI without a PyPI equivalent can be
considered abandoned.
And if we do implement a minimum period on TestPyPI before graduation to
PyPI, this would also give us a measure of protection from malicious typo
squatting:
http://incolumitas.com/2016/06/08/typosquatting-package-managers/
Regards,
Leo
On 22 July 2016 at 13:39, Donald Stufft
On Jul 22, 2016, at 11:47 AM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
If the core devs think it's fine and dandy like it is, we can all stop talking about it.
I think they’re certainly a problem. The current solutions that have been proposed have their own problems of course, and problems enough that I don’t feel comfortable implementing them. Personally I don’t currently have the time to work on a better solution but if someone did that’d be fine with me.
I would mention though that it’s possible there *is* no solution to this problem that doesn’t bring with it it’s own problems that are worse then the problem at hand. I’m not saying that’s the case, but just mentioning that it may be so.
By the way, there is an experiment underway with a "curated" community package repository for conda packages:
It's semi-curated, in the sense that only the core devs can approve a package being added, but once added, anyone can maintain it.
It's going very well, but I'm not sure how it's going to scale. So far, 900 or so packages, 80 or so maintainers. Which I think is very impressive for a young system, but still a lot smaller than it could be.
But I think PyPA should keep an eye on it -- I'm sure there will be lessons learned.
conda-forge and PyPI are not really similar. For conda-forge they’re largely just repackaging other software for the conda system. This makes it function better when you have just anyone of the core team able to work on stuff, because all they’re doing to adjusting the build, upgrading versions etc. They’re not working on the projects themselves.
Curated also goes against the sort of “spirit” of PyPI. It’s a place where anyone can release something and immediately be able to be part of the ecosystem. Adding curation on top of that would change the nature of PyPI, maybe for the better or maybe for the worst, but it would fundamentally change PyPI.
— Donald Stufft
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 07/22/2016 12:39 PM, Donald Stufft wrote:
On Jul 22, 2016, at 11:47 AM, Chris Barker - NOAA Federal
wrote: If the core devs think it's fine and dandy like it is, we can all stop talking about it. I think they’re certainly a problem. The current solutions that have been proposed have their own problems of course, and problems enough that I don’t feel comfortable implementing them. Personally I don’t currently have the time to work on a better solution but if someone did that’d be fine with me.
I would mention though that it’s possible there *is* no solution to this problem that doesn’t bring with it it’s own problems that are worse then the problem at hand. I’m not saying that’s the case, but just mentioning that it may be so. Is there a place where the currently proposed solutions are briefly outlined?
One solution that seems apparent to me is to move to an org/package hierarchy like what GitHub has. By default, packages get published under a default namespace: default/flask legacy/flask (you get the point, probably need a better name) unless the user has registered on pypi for an organization and publishes the package under that org: pallets/flask You would still have contention at the org level, but my guess is this contention would be much less significant than the current contention that is faced with only having a single-level namespace for package names. You could further improve this by having org creation requests either A) approved to prevent name squatting or B) have an appeal process for org name squatting that is blatant (e.g. I register the "google" or "pypa" org) and/or C) expire orgs that are no longer maintained. The details of both A & B & C would be tricky to get right, but the rules would at least be decided on from the beginning, so people know what the conditions are. If they don't like those conditions, then they don't get an org, and the situation they are in with name contention is exactly the same as it is now. All legacy packages operate under the current ruleset. All orgs and their packages operate under the new ruleset. Hopefully avoiding complaints of "you changed the game on us." You could also operate the org registration idea under "beta" conditions for first couple years to work out kinks in the process and warn people up-front that the rules could change during that time. By mapping all current packages under some "legacy" namespace, there should be room for backwards compatibility. So, if my projects require "flask" either pip or Warehouse knows to return "legacy/flask." Has this been proposed before? Any interest? *Randy Syring* Husband | Father | Redeemed Sinner /"For what does it profit a man to gain the whole world and forfeit his soul?" (Mark 8:36 ESV)/
A more conservative approach might be to flag high-risk, typo-prone package
names as requiring moderator approval to register. Some combination of
looking at common 404s (or whatever happens when a client asks for a
non-existent package), some string metrics (Levenshtein, Jaro, whatever) to
an existing package, etc. could be used. Light moderation should be able to
tell that someone who wants to register "requsets" should maybe be
challenged as to why.
Someone that wants to typo-attack would need to go after many more
low-value names. That said, is there some sort of rate-limiter in place to
prevent someone from registering hundreds of names in a short span?
Probably easy to defeat with multiple accounts though? In any event,
figuring out how to best attack PyPI/pip users is the best way to improve
security.
On Fri, Jul 22, 2016 at 12:57 PM, Randy Syring
On 07/22/2016 12:39 PM, Donald Stufft wrote:
On Jul 22, 2016, at 11:47 AM, Chris Barker - NOAA Federal
wrote: If the core devs think it's fine and dandy like it is, we can all stop talking about it.
I think they’re certainly a problem. The current solutions that have been proposed have their own problems of course, and problems enough that I don’t feel comfortable implementing them. Personally I don’t currently have the time to work on a better solution but if someone did that’d be fine with me.
I would mention though that it’s possible there *is* no solution to this problem that doesn’t bring with it it’s own problems that are worse then the problem at hand. I’m not saying that’s the case, but just mentioning that it may be so.
Is there a place where the currently proposed solutions are briefly outlined?
One solution that seems apparent to me is to move to an org/package hierarchy like what GitHub has. By default, packages get published under a default namespace:
default/flask legacy/flask (you get the point, probably need a better name)
unless the user has registered on pypi for an organization and publishes the package under that org:
pallets/flask
You would still have contention at the org level, but my guess is this contention would be much less significant than the current contention that is faced with only having a single-level namespace for package names. You could further improve this by having org creation requests either A) approved to prevent name squatting or B) have an appeal process for org name squatting that is blatant (e.g. I register the "google" or "pypa" org) and/or C) expire orgs that are no longer maintained.
The details of both A & B & C would be tricky to get right, but the rules would at least be decided on from the beginning, so people know what the conditions are. If they don't like those conditions, then they don't get an org, and the situation they are in with name contention is exactly the same as it is now. All legacy packages operate under the current ruleset. All orgs and their packages operate under the new ruleset. Hopefully avoiding complaints of "you changed the game on us." You could also operate the org registration idea under "beta" conditions for first couple years to work out kinks in the process and warn people up-front that the rules could change during that time.
By mapping all current packages under some "legacy" namespace, there should be room for backwards compatibility. So, if my projects require "flask" either pip or Warehouse knows to return "legacy/flask."
Has this been proposed before? Any interest?
*Randy Syring* Husband | Father | Redeemed Sinner
*"For what does it profit a man to gain the whole world and forfeit his soul?" (Mark 8:36 ESV)*
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Good replies from Donald, Paul, et al already, but rather than
replying to individual points, I figure it's best to just respond to
Chris's original question with my own thoughts]
On 23 July 2016 at 01:47, Chris Barker - NOAA Federal
Right now, the barrier to entry to putting a package up on PyPI is very, very low. There are a lot of 'toy', or 'experimental', or 'well intentioned but abandoned' projects on there.
If there was a clear and obvious place for folks to put these packages while they work out the kinks, and determine whether the package is really going to fly, I think people would use it.
That place is PyPI. Having a separate "maybe good, maybe bad" location for experimental packages (which is the way a lot of people use GitHub repos these days, relying on direct-from-VCS installs) leads to a persistent problem where folks later decide "I want to publish this officially", go to claim the name on PyPI as well and find they have a conflict. As further examples of similar "multiple authoritative namespaces" problems, we sometimes see folks creating Python projects specifically for Linux distros rather than upstream Python causing name collisions when the upstream project is later packaged for that distro (e.g. python-mock conflicting with Fedora's RPM "mock" build tool - resolved by the Fedora library being renamed to "mockbuild" while keeping "mock" as the CLI name), and we also see cross-ecosystem conflicts (e.g. python-pip and perl-pip conflicting on the "pip" CLI name, resolved by the good graces of the perl-pip maintainer in ceding the unqualified name to the Python package). You can also look at the number of semantically versioned packages that enjoy huge adoption even before the author(s) pull the trigger on a 1.0 release (e.g. requests, SQL Alchemy, the TOML spec off the top of my head), which reveals that package authors often have higher standards for "good enough for 1.0" than their prospective users do (the standard for users is generally "it's good enough to solve my current problem", while the standard for maintainers is more likely to be "it isn't a hellish nightmare to maintain as people start finding more corner cases that didn't previously occur to me/us"). The other instinctive answer is "namespaces solve this!", but the truth is they don't, as: 1. Namespaces tend to belong to organisations, and particularly for utility projects, "this utility happened to be developed by this organisation" is entirely arbitrary and mostly irrelevant (except insofar as if you trust a particular org you may trust their libraries more, but you can get that from the metadata). 2. If you want to build a genuinely inclusive open source project, branding it with the name of your company is one of the *worst* things you can do (since it prevents any chance of a feeling of shared ownership by the entire community) 3. Python already allows distribution package names to differ from import package names, as well as supporting namespace packages, which means folks *could* have adopted namespaces-by-convention if they were a genuinely compelling solution. That hasn't happened, which suggests there's an inherent user experience problem with the idea. Would requests be more discoverable if Kenneth had called it "kreitz-requests" instead? What if we had "org-pocoo-flask" instead of just plain "flask"? Or "ljworld-django"? How many folks haven't even looked at "zc.interface" because the association with Zope prompts them to dismiss it out of hand? Organisational namespaces on a service like GitHub are *absolutely* useful, but what they're replacing is the model where different organisations run their own version control server (just as different organisations may run their own private Python index today), rather than being a good thing to expose directly to end users that just want to locate and use a piece of software.
However, in these discussions, I've observed a common theme: folks in the community bring up issues about unmaintained packages, namespace pollution, etc. the core PyPA folks respond with generally well reasoned arguments why proposed solutions won't fly.
But it's totally unclear to me whether the core devs don't think these are problems worth addressing, or think they can only be addresses with major effort that no one has time for.
If we accept my premise that "single global namespace, flat except by convention" really does offer the most attractive overall user experience for a software distribution ecosystem, what's missing from the status quo on PyPI? In a word? Gardening. Post-publication curation like that provided by Linux distros and other redistributor communities (including conda-forge) can help with the "What's worth my time and attention?" question (we can think of this as filtering the output of an orchard, and only passing along the best fruit), but it can't address the problem of undesirable name collisions and other problems on the main index itself (we can think of this as actually weeding the original orchard, and pruning the trees when necessary) However, we don't have anyone that's specifically responsible for looking after the shared orchard that is PyPI, and this isn't something we can reasonably ask volunteers to do, as it's an intensely political and emotional draining task where your primary responsibility is deciding if and when it's appropriate to *take people's toys away*. As an added bonus, becoming more active in content curation as a platform provider means potentially opening yourself up to lawsuits as well, if folks object to either your reclamation of a name they previously controlled, or else your refusal to reclaim a particular name for *their* purposes. So while first-come-first-served namespace management definitely doesn't provide the best possible user experience for Pythonistas, it *does* minimise the volume of ongoing namespace maintenance work required, as well as the PSF's exposure to legal liability as the platform provider. These concerns aren't something that "policy enforcement can be automated" really addresses either, as even if you have algorithmic enforcement, those "The robots have decided to take your project away from you" emails are still going to be going to real people, and those folks may still be understandably upset. "Our algorithms did it, not our staff" is also a pretty thin legal reed to pin your hopes on if somebody turns out to be upset enough to sue you about it. This means the entire situation changes if the PSF's Packaging Working Group receives sufficient funding (either directly through https://donate.pypi.io or through general PSF sponsorships) to staff at least one full-time "PyPI Publisher Support" role (in addition to the other points on the WG's TODO list), as well as to pay for analyses of the legal implications of having more formal content curation policies. However, in the absence of such ongoing funding support, the current laissez-faire policies will necessarily remain in place, as they're the only affordable option. Regards, Nick. P.S. If folks work at end user organisations for whom contributing something like $10k a year to the PSF for a Silver sponsorship (https://www.python.org/psf/sponsorship/ ) would be a rounding error in the annual budget want to see change in this kind of area, then advocating within your organisation to become PSF sponsor members on the basis of "We want to enable the PSF to work on community infrastructure improvement activities it's currently not pursuing for lack of resources" is probably one of *the most helpful things you can do* for the wider Python community. As individuals, we can at most sustainably contribute a few hours a week as volunteers, or maybe wrangle a full-time upstream contribution role if we're lucky enough to have or find a supportive employer. By contrast, the PSF is in a position to look for ways to let volunteers focus their time and energy on activities that are more inherently rewarding, by directing funding towards the many necessary-but-draining activities that go into supporting a collaborative software development community. The more resources the PSF has available, the more time and energy it will be able to devote towards identifying and addressing unwanted sources of friction in the collaborative process. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (12)
-
Brett Cannon
-
Chris Barker - NOAA Federal
-
Dima Tisnek
-
Donald Stufft
-
Glyph Lefkowitz
-
Jim Fulton
-
Leonardo Rochael Almeida
-
Nick Coghlan
-
Nick Timkovich
-
Paul Moore
-
Randy Syring
-
Steve Dower