Re: [Distutils] PyPi not allowing duplicate filenames
On Wed, Oct 14, 2015 at 9:56 AM, Dave Forgac
This was discussed recently here: https://github.com/pypa/packaging-problems/issues/74
and on this list at other times. Though the above issue was pretty focused
on restoring a deleted file without any changes -- which seems like a
no-brainer to me, as long as someone wants to put the effort into the
infrastructure.
(the soft delete option seems like a good idea to me).
But I'm talking about the cases of "whoops! I really wish I hadn't uploaded
that one". We can improve the tooling (some discussion on this in this
thread right now...), but people are people and some of us are stupid
and/or careless. So this WILL happen.
And it just seems pedantic to say: hey -- you've already put that one there
-- maybe even two minutes ago, so there is NO WAY to fix your mistake. If
it happens quickly, then no one has downloaded it, it hasn't made its way
to the mirrors, etc...
And again -- we are all adults here: if you as the package maintainer want
to do somethign that is sonfusing to your users, is it up to PyPi to never
let you do that? (I think it IS up to PyPi to strongly recommend that you
don't -- i.e. make it hard to do, and impossible to do thoughtlessly)
On Wed, Oct 14, 2015 at 10:00 AM, Jeremy Stanley
You should have to jump through all sorts of hoops, and make it really clear that it is a BAD IDEA in the general case, but it'd be good to have it be possible. [...]
It used to be possible.
Was it also easy to do without very careful consideration? Or were the hoops I mentioned in place? I can't find it right now, but I think someone in this thread suggested a "staging area", so we could essentially do a trail run: upload to PyPi, tell a few trusted friends about it, have them test it, then, and only then, push it to the official channel. Maybe the infrastructure for that would be more work than it's worth, but I like it. This would fit into the basic principle that you should always be able to test something in as close the final environment as possible, before you commit to it. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On October 14, 2015 at 3:58:52 PM, Chris Barker (chris.barker@noaa.gov) wrote:
On Wed, Oct 14, 2015 at 9:56 AM, Dave Forgac wrote:
This was discussed recently here: https://github.com/pypa/packaging-problems/issues/74
and on this list at other times. Though the above issue was pretty focused on restoring a deleted file without any changes -- which seems like a no-brainer to me, as long as someone wants to put the effort into the infrastructure.
(the soft delete option seems like a good idea to me).
I plan on implementing a soft delete in Warehouse.
But I'm talking about the cases of "whoops! I really wish I hadn't uploaded that one". We can improve the tooling (some discussion on this in this thread right now...), but people are people and some of us are stupid and/or careless. So this WILL happen.
And it just seems pedantic to say: hey -- you've already put that one there -- maybe even two minutes ago, so there is NO WAY to fix your mistake. If it happens quickly, then no one has downloaded it, it hasn't made its way to the mirrors, etc…
Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute). If anyone has downloaded it then they will have pretty much permanently cached the package, first in the download cache and then again in the wheel cache (assuming it wasn’t a wheel already, and they had that enabled). The original package was NumPy. It had 30,982 downloads in the last day, so we can average that out to 1290 downloads an hour or 21 downloads a minute. If it takes you two minutes to notice it and delete it, then there are ~40 people who already have the original version cached and who will not notice the updated version. Version numbers are free, use more of them. If you can’t just issue new releases quickly, then test your release before you upload it (and then upload it with twine) and you can even upload it to Test PyPI to test things earlier than that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Wed, Oct 14, 2015 at 11:04 PM, Donald Stufft
Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute). If anyone has downloaded it then they will have pretty much permanently cached the package, first in the download cache and then again in the wheel cache (assuming it wasn’t a wheel already, and they had that enabled). The original package was NumPy. It had 30,982 downloads in the last day, so we can average that out to 1290 downloads an hour or 21 downloads a minute. If it takes you two minutes to notice it and delete it, then there are ~40 people who already have the original version cached and who will not notice the updated version.
This reminds me of Gmail's "unsend" feature where email would be delayed 10 seconds or something, giving a window to press the unsend button. Maybe something like that could be implemented? Like a minute or two in which you could unpublish? And a --no-regrets mode of course, for people that want to live the moment :-) Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
On Wed, Oct 14, 2015 at 1:20 PM, Ionel Cristian Mărieș
This reminds me of Gmail's "unsend" feature where email would be delayed 10 seconds or something, giving a window to press the unsend button.
FWIW, I am a big fan of that -- and use it with remarkably frequency. -- maybe I'm just twitchier than most... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Oct 14, 2015, at 1:04 PM, Donald Stufft
wrote: Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute). If anyone has downloaded it then they will have pretty much permanently cached the package, first in the download cache and then again in the wheel cache (assuming it wasn’t a wheel already, and they had that enabled). The original package was NumPy. It had 30,982 downloads in the last day, so we can average that out to 1290 downloads an hour or 21 downloads a minute. If it takes you two minutes to notice it and delete it, then there are ~40 people who already have the original version cached and who will not notice the updated version.
While I don't think PyPI should allow modification of uploaded packages necessarily, I do think that Pip's caching is (A) too aggressive and (B) too opaque. For example: https://github.com/pypa/pip/issues/3127 https://github.com/pypa/pip/issues/3034 https://github.com/pypa/pip/issues/3025 https://github.com/pypa/pip/issues/2908 https://github.com/pypa/pip/issues/2882 etc, etc. I know there are some directories platform-specific directories I can delete, but almost once a day I want a command like `pip cache show´ which can show me what is cached and when/where it was built, `pip cache clear´ or `pip cache remove twisted´ or `pip cache remove cffi>=1.0´. I don't want to have to care if it's in the HTTP cache or the wheel cache, or how it got there; I also don't want to have to bust a ~200 megabyte cache that saves me hours a day just because there's one bad entry in there. -glyph
On 15 October 2015 at 10:01, Glyph Lefkowitz
On Oct 14, 2015, at 1:04 PM, Donald Stufft
wrote: Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute). If anyone has downloaded it then they will have pretty much permanently cached the package, first in the download cache and then again in the wheel cache (assuming it wasn’t a wheel already, and they had that enabled). The original package was NumPy. It had 30,982 downloads in the last day, so we can average that out to 1290 downloads an hour or 21 downloads a minute. If it takes you two minutes to notice it and delete it, then there are ~40 people who already have the original version cached and who will not notice the updated version.
While I don't think PyPI should allow modification of uploaded packages necessarily, I do think that Pip's caching is (A) too aggressive and (B) too opaque. For example:
https://github.com/pypa/pip/issues/3127 https://github.com/pypa/pip/issues/3034 https://github.com/pypa/pip/issues/3025 https://github.com/pypa/pip/issues/2908 https://github.com/pypa/pip/issues/2882
etc, etc.
I know there are some directories platform-specific directories I can delete, but almost once a day I want a command like `pip cache show´ which can show me what is cached and when/where it was built, `pip cache clear´ or `pip cache remove twisted´ or `pip cache remove cffi>=1.0´. I don't want to have to care if it's in the HTTP cache or the wheel cache, or how it got there; I also don't want to have to bust a ~200 megabyte cache that saves me hours a day just because there's one bad entry in there.
So - none of those need PEPs - just dive in and put a patch forward.
-Rob
--
Robert Collins
On October 14, 2015 at 5:01:49 PM, Glyph Lefkowitz (glyph@twistedmatrix.com) wrote:
On Oct 14, 2015, at 1:04 PM, Donald Stufft wrote:
Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute). If anyone has downloaded it then they will have pretty much permanently cached the package, first in the download cache and then again in the wheel cache (assuming it wasn’t a wheel already, and they had that enabled). The original package was NumPy. It had 30,982 downloads in the last day, so we can average that out to 1290 downloads an hour or 21 downloads a minute. If it takes you two minutes to notice it and delete it, then there are ~40 people who already have the original version cached and who will not notice the updated version.
While I don't think PyPI should allow modification of uploaded packages necessarily, I do think that Pip's caching is (A) too aggressive and (B) too opaque. For example:
https://github.com/pypa/pip/issues/3127 https://github.com/pypa/pip/issues/3034
These two are the same thing, and a bug. Fix the bug and the http cache is perfectly fine IMO. It never caches without headers or an Etag saying that it can. PyPI uses really long cache-control headers (1y+) on a package, and it can do this because it can promise that this URL will either continue to exist, or it’ll just go away. There are not, AFAIK, any problems with the download cache being too aggressive other than one bug where it doesn’t correctly identify a failed download.
This is a problem with the wheel cache, with a path to fix it on the ticket. Just needs someone to implement the PR.
I think this is probably just a wontfix issue tbh. The cache isn’t designed to be used that way really.
Another problem with the wheel cache :/
etc, etc.
I know there are some directories platform-specific directories I can delete, but almost once a day I want a command like `pip cache show´ which can show me what is cached and when/where it was built, `pip cache clear´ or `pip cache remove twisted´ or `pip cache remove cffi>=1.0´. I don't want to have to care if it's in the HTTP cache or the wheel cache, or how it got there; I also don't want to have to bust a ~200 megabyte cache that saves me hours a day just because there's one bad entry in there.
Absent that one bug, I’m not aware of any reason why someone would need to ever purge their http cache except for if they were using a non PyPI index and they sent cache headers when they shouldn’t have (which is not pip’s fault in the slightest). The wheel cache is new and still has some issues to work out. I suspect it will need some method to inspect and remove items from the cache, but that work won’t get done until someone does it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
FWIW https://github.com/pypa/pip/pull/3146 exists for the wheel cache.
Concerning http cache and https://github.com/ionrock/cachecontrol the main
issue is that the cached request is not present in the cache. We only have
a "hash -> response" mapping.
I plan to make a PR asking to also keep the request for introspection and
move to a "hash -> (request, response)" mapping.
On Wed, Oct 14, 2015 at 11:01 PM, Glyph Lefkowitz
On Oct 14, 2015, at 1:04 PM, Donald Stufft
wrote: Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute). If anyone has downloaded it then they will have pretty much permanently cached the package, first in the download cache and then again in the wheel cache (assuming it wasn’t a wheel already, and they had that enabled). The original package was NumPy. It had 30,982 downloads in the last day, so we can average that out to 1290 downloads an hour or 21 downloads a minute. If it takes you two minutes to notice it and delete it, then there are ~40 people who already have the original version cached and who will not notice the updated version.
While I don't think PyPI should allow modification of uploaded packages necessarily, I do think that Pip's caching is (A) too aggressive and (B) too opaque. For example:
https://github.com/pypa/pip/issues/3127 https://github.com/pypa/pip/issues/3034 https://github.com/pypa/pip/issues/3025 https://github.com/pypa/pip/issues/2908 https://github.com/pypa/pip/issues/2882
etc, etc.
I know there are some directories platform-specific directories I can delete, but almost once a day I want a command like `pip cache show´ which can show me what is cached and when/where it was built, `pip cache clear´ or `pip cache remove twisted´ or `pip cache remove cffi>=1.0´. I don't want to have to care if it's in the HTTP cache or the wheel cache, or how it got there; I also don't want to have to bust a ~200 megabyte cache that saves me hours a day just because there's one bad entry in there.
-glyph
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Wed, Oct 14, 2015 at 1:04 PM, Donald Stufft
And it just seems pedantic to say: hey -- you've already put that one there -- maybe even two minutes ago, so there is NO WAY to fix your mistake. If it happens quickly, then no one has downloaded it, it hasn't made its way to the mirrors, etc…
Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute).
interesting -- it doesn't show up in the PyPi search nearly that fast :-) If anyone has downloaded it Honestly, I'm really not worried about people having downloaded it -- but if it gets mirrored almost instantaneously, then there's a problem.
The original package was NumPy. It had 30,982 downloads in the last day,
well, let's face it, most packages are not nearly as popular as numpy... But again, if it's been mirrored, we're stuck.
Version numbers are free, use more of them. If you can’t just issue new releases quickly, then test your release before you upload it (and then upload it with twine) and you can even upload it to Test PyPI to test things earlier than that.
Bingo! I'll check out Test PyPi -- maybe that's just what I"ve been looking for. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On October 14, 2015 at 6:13:14 PM, Chris Barker (chris.barker@noaa.gov) wrote:
On Wed, Oct 14, 2015 at 1:04 PM, Donald Stufft wrote:
And it just seems pedantic to say: hey -- you've already put that one there -- maybe even two minutes ago, so there is NO WAY to fix your mistake. If it happens quickly, then no one has downloaded it, it hasn't made its way to the mirrors, etc…
Generally within 60-120 seconds it’s available in mirrors (most of them resync once a minute).
interesting -- it doesn't show up in the PyPi search nearly that fast :-)
That’s mostly because search is barely being held together right now. When all the new stuff shakes out, that should update within a second or two. For right now it’s on a 15 minute cron job.
If anyone has downloaded it
Honestly, I'm really not worried about people having downloaded it -- but if it gets mirrored almost instantaneously, then there's a problem.
The original package was NumPy. It had 30,982 downloads in the last day,
well, let's face it, most packages are not nearly as popular as numpy...
But again, if it's been mirrored, we're stuck.
Version numbers are free, use more of them. If you can’t just issue new releases quickly, then test your release before you upload it (and then upload it with twine) and you can even upload it to Test PyPI to test things earlier than that.
Bingo! I'll check out Test PyPi -- maybe that's just what I"ve been looking for.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Wed, Oct 14, 2015 at 8:52 PM, Chris Barker
On Wed, Oct 14, 2015 at 9:56 AM, Dave Forgac
wrote: This was discussed recently here: https://github.com/pypa/packaging-problems/issues/74
and on this list at other times. Though the above issue was pretty focused on restoring a deleted file without any changes -- which seems like a no-brainer to me, as long as someone wants to put the effort into the infrastructure.
(the soft delete option seems like a good idea to me).
But I'm talking about the cases of "whoops! I really wish I hadn't uploaded that one". We can improve the tooling (some discussion on this in this thread right now...), but people are people and some of us are stupid and/or careless. So this WILL happen.
And it just seems pedantic to say: hey -- you've already put that one there -- maybe even two minutes ago, so there is NO WAY to fix your mistake. If it happens quickly, then no one has downloaded it, it hasn't made its way to the mirrors, etc...
It is not pendantic, for reasons mentioned by Donald. I have not done numpy releases for half a decade now, but it was already automated enough that putting a new version was not very costly then. And you did not have travis-ci, appveyor, tox, ubiquitous AWS 5 years ago ... I am sure there are things we can do to improve numpy's release process to avoid this in the future. David
And again -- we are all adults here: if you as the package maintainer want to do somethign that is sonfusing to your users, is it up to PyPi to never let you do that? (I think it IS up to PyPi to strongly recommend that you don't -- i.e. make it hard to do, and impossible to do thoughtlessly)
On Wed, Oct 14, 2015 at 10:00 AM, Jeremy Stanley
wrote: You should have to jump through all sorts of hoops, and make it really clear that it is a BAD IDEA in the general case, but it'd be good to have it be possible. [...]
It used to be possible.
Was it also easy to do without very careful consideration? Or were the hoops I mentioned in place?
I can't find it right now, but I think someone in this thread suggested a "staging area", so we could essentially do a trail run: upload to PyPi, tell a few trusted friends about it, have them test it, then, and only then, push it to the official channel. Maybe the infrastructure for that would be more work than it's worth, but I like it.
This would fit into the basic principle that you should always be able to test something in as close the final environment as possible, before you commit to it.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Wed, Oct 14, 2015 at 1:59 PM, David Cournapeau
But I'm talking about the cases of "whoops! I really wish I hadn't
uploaded that one". We can improve the tooling (some discussion on this in this thread right now...), but people are people and some of us are stupid and/or careless. So this WILL happen.
I have not done numpy releases for half a decade now, but it was already
automated enough that putting a new version was not very costly then.
yeah, I suppose releases are cheap -- though part of the problem is that your users are likley to think that you actually fixed a bug or something. And maybe wonder why you went from 1.2.3 to 1.3.5 seemingly all at once... another note-- conda has teh concetp of a "build" that's tacked on teh release for conda pacakges. So if I updated somethign about how teh packge is buitl, but am using teh same underllying version of teh package, I update teh build number, get a new "version" of the package, but it's clear that the pacakge itself is the same version. for instance, I'm messing around right now with building libgd for conda, and the latest version I have up on anaconda.org is: libgd-2.1.1 but the actual file is: libgd-2.1.1-1.tar.bz2 (and there is a libgd-2.1.1-0.tar.bz2 there too...) Maybe this would be helpful for PyPi, too? I am sure there are things we can do to improve numpy's release process to
avoid this in the future.
numpy was just an example -- we are all likely to make mistakes in the future - it's human nature. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On October 14, 2015 at 6:24:55 PM, Chris Barker (chris.barker@noaa.gov) wrote:
another note-- conda has teh concetp of a "build" that's tacked on teh release for conda pacakges.
So if I updated somethign about how teh packge is buitl, but am using teh same underllying version of teh package, I update teh build number, get a new "version" of the package, but it's clear that the pacakge itself is the same version.
Wheels have a build number concept which is independent of the version and can be used to upload a new build without creating a new release… at least in theory. I’m not sure if anyone has ever tried to actually use it though. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Wed, Oct 14, 2015 at 3:26 PM, Donald Stufft
On October 14, 2015 at 6:24:55 PM, Chris Barker (chris.barker@noaa.gov) wrote:
another note-- conda has teh concetp of a "build" that's tacked on teh release for conda pacakges.
So if I updated somethign about how teh packge is buitl, but am using teh same underllying version of teh package, I update teh build number, get a new "version" of the package, but it's clear that the pacakge itself is the same version.
Wheels have a build number concept which is independent of the version and can be used to upload a new build without creating a new release… at least in theory.
sounds like exactly what conda is doing (and the same context)
I’m not sure if anyone has ever tried to actually use it though.
I expect they will -- it's pretty darn useful in conda. And it's all too easy to build a binary wheel that works on the developer machine, but not everywhere else... (OK, the CI systems are making that less likely..) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Wed, Oct 14, 2015 at 11:23 PM, Chris Barker
On Wed, Oct 14, 2015 at 1:59 PM, David Cournapeau
wrote: But I'm talking about the cases of "whoops! I really wish I hadn't
uploaded that one". We can improve the tooling (some discussion on this in this thread right now...), but people are people and some of us are stupid and/or careless. So this WILL happen.
I have not done numpy releases for half a decade now, but it was already
automated enough that putting a new version was not very costly then.
yeah, I suppose releases are cheap -- though part of the problem is that your users are likley to think that you actually fixed a bug or something. And maybe wonder why you went from 1.2.3 to 1.3.5 seemingly all at once...
another note-- conda has teh concetp of a "build" that's tacked on teh release for conda pacakges.
Note that build is a feature for binaries: different builds refer to the same upstream source. Most linux distributions have the notion of downstream version which is a generalization of that. David
So if I updated somethign about how teh packge is buitl, but am using teh same underllying version of teh package, I update teh build number, get a new "version" of the package, but it's clear that the pacakge itself is the same version.
for instance, I'm messing around right now with building libgd for conda, and the latest version I have up on anaconda.org is:
libgd-2.1.1
but the actual file is:
libgd-2.1.1-1.tar.bz2
(and there is a libgd-2.1.1-0.tar.bz2 there too...)
Maybe this would be helpful for PyPi, too?
I am sure there are things we can do to improve numpy's release process to
avoid this in the future.
numpy was just an example -- we are all likely to make mistakes in the future - it's human nature.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
participants (7)
-
Chris Barker
-
David Cournapeau
-
Donald Stufft
-
Glyph Lefkowitz
-
Ionel Cristian Mărieș
-
Robert Collins
-
Xavier Fernandez