[Distutils] PyPi not allowing duplicate filenames

Donald Stufft donald at stufft.io
Wed Oct 14 23:46:17 CEST 2015

On October 14, 2015 at 5:01:49 PM, Glyph Lefkowitz (glyph at twistedmatrix.com) wrote:
> > On Oct 14, 2015, at 1:04 PM, Donald Stufft wrote:
> >
> > Generally within 60-120 seconds it’s available in mirrors (most of them resync once  
> a minute). If anyone has downloaded it then they will have pretty much permanently cached  
> the package, first in the download cache and then again in the wheel cache (assuming it  
> wasn’t a wheel already, and they had that enabled). The original package was NumPy. It  
> had 30,982 downloads in the last day, so we can average that out to 1290 downloads an hour  
> or 21 downloads a minute. If it takes you two minutes to notice it and delete it, then there  
> are ~40 people who already have the original version cached and who will not notice the  
> updated version.
> While I don't think PyPI should allow modification of uploaded packages necessarily,  
> I do think that Pip's caching is (A) too aggressive and (B) too opaque. For example:
> https://github.com/pypa/pip/issues/3127
> https://github.com/pypa/pip/issues/3034

These two are the same thing, and a bug. Fix the bug and the http cache is perfectly fine IMO. It never caches without headers or an Etag saying that it can. PyPI uses really long cache-control headers (1y+) on a package, and it can do this because it can promise that this URL will either continue to exist, or it’ll just go away.

There are not, AFAIK, any problems with the download cache being too aggressive other than one bug where it doesn’t correctly identify a failed download.

> https://github.com/pypa/pip/issues/3025

This is a problem with the wheel cache, with a path to fix it on the ticket. Just needs someone to implement the PR.

> https://github.com/pypa/pip/issues/2908

I think this is probably just a wontfix issue tbh. The cache isn’t designed to be used that way really.

> https://github.com/pypa/pip/issues/2882

Another problem with the wheel cache :/

> etc, etc.
> I know there are some directories platform-specific directories I can delete, but almost  
> once a day I want a command like `pip cache show´ which can show me what is cached and when/where  
> it was built, `pip cache clear´ or `pip cache remove twisted´ or `pip cache remove cffi>=1.0´.  
> I don't want to have to care if it's in the HTTP cache or the wheel cache, or how it got there;  
> I also don't want to have to bust a ~200 megabyte cache that saves me hours a day just because  
> there's one bad entry in there.

Absent that one bug, I’m not aware of any reason why someone would need to ever purge their http cache except for if they were using a non PyPI index and they sent cache headers when they shouldn’t have (which is not pip’s fault in the slightest).

The wheel cache is new and still has some issues to work out. I suspect it will need some method to inspect and remove items from the cache, but that work won’t get done until someone does it.

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

More information about the Distutils-SIG mailing list