On Sep 18, 2014, at 2:26 PM, Paul Moore email@example.com wrote:
Maybe this can't be solved in any meaningful sense, and maybe it's not something the "Python project" should take responsibility for, but without any doubt, it's the single most significant improvement that could be made to my experience with PyPI.
Package Discovery is absolutely a thing we stink at, and something we should do better at. This is squarly in the PyPI side of things, I don't think that python-dev needs to recommend a stdlib++ nor any hand picked group of people.. maybe.
The PyPI search is kinda grody, it's a horrible inefficienct SQL query that uses a bunch of LIKEs and regexes if I recall. Warehouse has switched to an Elasticsearch backend and I've attempted to do a little tuning of it, however I haven't done a whole lot largely because I'm not an expert and It wasn't that high of a priority.
Fundamentally though the problem is what do we use to determine if a package is "good" or not. Folks may or may not remember the great ratings/comments war of yore which was an attempt to add some end user driven ratings to packages on PyPI. That didn't particularly go well and they've long since been disabled.
One of the problems is that the names of packages are basically either extremely relevant or not relevant at all. Taking a look at Django migrations prior to Django 1.7 you had "South" which was the de facto standard and "django-migrations" which was a more or less defunct project. Then you'd have a ton of django-* packages which mention both Django and the fact that they have migrations shipped within their long_description. It got somewhat hard to get South to score fairly high on "django migrations".
Popularity is a reasonable metric, if I recall Warehouse uses the download counts of a project to weight the search results (although it should probably use rolling counts and not total counts). The idea here is that something that is downloaded more often is likely to be a better overall choice for most people.
On Crate I had "favorites" which were functionally equivilant to stars on GitHub which would influence the search results as well.
That's about all that I've been able to think of to glean information from, any sort of ratings or what have you system needs to be done very carefully so that it actually provides value and isn't just a pain point.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA