[Python-Dev] Yearly PyPI breakage

Donald Stufft donald at stufft.io
Thu May 5 22:31:48 EDT 2016


> On May 5, 2016, at 8:35 PM, David Wilson <dw+python-dev at hmmz.org> wrote:
> 
> On Fri, May 06, 2016 at 12:03:48AM +0000, Brett Cannon wrote:
> 
>>    Is there something to contemplate in here? I dislike posting questions
>>    instead of answers, but it seems apparent there is a problem here and it
>>    continues to remain unaddressed.
> 
>> This is whole thread is off-topic precisely because all of this is
>> discussed -- in the open -- on distutils-sig and decided there. If
>> people feel changes need to be made like broadcasting to a wider
>> audience when a change occurs, then please bring it up on
>> distutils-sig.
> 
> I respectfully disagree, as this has been the default response applied
> for years, and it seems friction and dissemination have not been
> improved by it. Packaging is not some adjunct technicality, anyone
> learning Python in the past few years at least has been taught pip
> within the first week.
> 
> 
>> But if people choose not to participate then they are implicitly
>> delegating decision powers to those who do participate
> 
> I believe this is also practically rhetorical in nature. I've watched
> the wars on distutils-sig for many years now, and the general strategy
> is that beyond minor outside influence, the process there is occupied by
> few individuals who are resistant to outside change. Outside influence
> is regularly met with essay-length reponses and tangential minutia until
> the energy of the challenge is expended.

Honestly, I just don't think this is an honest characterization. It is true
that in general there are few people who bother to put in the effort to take a
proposal from start to finish including actually writing the code to make such
a thing happen.

The problem of packaging is a particularly hard one where it's difficult to
make trade offs because unlike other systems, people are sort of locked into
using whatever the popular system is. If asyncio doesn't suite everyone that's
fine it's not hard for them to go and switch and use Twisted, Tornado, gevent,
eventlet, curio, etc etc but it's not very realistic for someone to opt out of
the packaging ecosystem.

In addition, it's much like something like HTTP or SMTP or the such where once
you add some feature, it becomes incredibly difficult to ever remove it if you
end up needing to (case in point, this thread and off site hosting), so we tend
to over scrutinize changes wherever we can to try and make sure we *really*
understand what the tradeoffs we're making are. Just as an example, it took a
year and a half for PEP 440 to be standardized which is for something as bite
sized as version numbers and PEP 440 was a continuation of the stalled PEP 386.
The most time consuming part of PEP 440 was trying our new rules against every
single version that existed on PyPI and minimizing the breakage.

Even then, we've had several recent PEPs go through (manylinux1, environment
markers, requirements) from people willing to do so.

> 
> As an example, one common argument is that "Donald is overworked",
> however as an example, I offered a very long time ago to implement full
> text indexing for PyPI search. At the time I belive I was told such
> things weren't necessary, only to learn a few years later that Donald
> himself implemented the same function, and it suffers from huge latency
> and accuracy issues in the meantime. The solution to those problems is
> of course the ever-delayed rewrite.

I don't remember the specific details around your proposal, but I'm pretty sure
that *I* never told you that "such things were not necessary" since I've never
held that opinion to my memory. What I have found [1][2] (to try and refresh my
memory) are threads from you that went largely unanswered in April of 2013,
which was right at the time I was heavily focused on getting PyPI setup behind
Fastly and wasn't really paying attention to much else. I don't see anything
else from you about it until September of 2015 when you offered your help again
and at that point I told you that we had already switched to using
Elasticsearch. In those two years since your initial offer, and then your
follow up in 2015 I do not see any pull request to the PyPI code base from you
(which, assuming they were reasonable would have been merged), however I do
see the PR [3] from Ernest (No, I didn't implement the search) which got merged
and deployed.

My experience is that people are often willing to *offer* help with PyPI, but
then they quickly disappear once they start to actually try and hack on it's
code base and realize how difficult it is to work with. That experience tends
to mean that I don't really get super excited when someone shows interest in
helping because it rarely actually manifests. I think we've had more people
contribute to Warehouse in the last year than we've *ever* had contribute to
the legacy PyPI code base, which I think says a lot about the decision to
switch.

> 
> Over on distutils-sig, one will learn that a large amount of effort has
> been poured into a rewrite of PyPI (an effort going on years now),
> however the original codebase was not far from rescue (I had a local
> copy almost entirely ported to Flask in a few days). There is no reason
> why this effort nor any other (like full text search) should be used, as
> it often is, as an argument in the decisionmaking process that largely
> governs how PyPI and pip have worked in the recent years, yet it only
> takes a few glances at the archives to demonstrate that it regularly is.
> 


I find the statement that "the original code base is not far from rescue" a bit
interesting, since you had previously stated [2] that:

    Again PyPI has been growing organically for a very long time, and
    dumping even more features in there doesn't seem a great idea. I
    looked at retrofitting PyPI with Flask, but there is simply too much
    custom code to be sure things won't be broken by doing it in a hurry.

In any case, yes we're rewriting PyPI and it's been something that I've put a
lot of effort into. That also means that I'm not particularly enthused about
spending a bunch of time and effort working on legacy PyPI because doing so
is incredibly demotivating to me to the point that the more time I spent in
that code base the more I want to quit all together. Even given that I still
have and would review PRs to that code base, it's just that very few people
ever suffer through that code base long enough to actually submit one.

I don't believe we've ever told someone that something can't happen because of
Warehouse, only that *I* won't implement something until after Warehouse. That
often times means that something won't happen until after Warehouse because of
the severe shortage of people with enough time and motivation to work on this
stuff but if someone did step up more things would get done.

[1] https://mail.python.org/pipermail/distutils-sig/2013-April/020553.html
[2] https://groups.google.com/d/msg/pypa-dev/ZjUNkczsKos/SzwNOckisXUJ
[3] https://bitbucket.org/pypa/pypi/pull-requests/81/implement-an-elasticsearch-index-for-pypi/diff

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160505/73f40a84/attachment.sig>


More information about the Python-Dev mailing list