On Sep 18, 2014, at 2:26 PM, Paul Moore <p.f.moore@gmail.com> wrote:

Maybe this can't be solved in any meaningful sense, and maybe it's not
something the "Python project" should take responsibility for, but
without any doubt, it's the single most significant improvement that
could be made to my experience with PyPI.

Package Discovery is absolutely a thing we stink at, and something we should
do better at. This is squarly in the PyPI side of things, I don't think that
python-dev needs to recommend a stdlib++ nor any hand picked group of people..
maybe.

The PyPI search is kinda grody, it's a horrible inefficienct SQL query that
uses a bunch of LIKEs and regexes if I recall. Warehouse has switched to an
Elasticsearch backend and I've attempted to do a little tuning of it, however
I haven't done a whole lot largely because I'm not an expert and It wasn't that
high of a priority.

Fundamentally though the problem is what do we use to determine if a package
is "good" or not. Folks may or may not remember the great ratings/comments
war of yore which was an attempt to add some end user driven ratings to
packages on PyPI. That didn't particularly go well and they've long since been
disabled.

One of the problems is that the names of packages are basically either extremely
relevant or not relevant at all. Taking a look at Django migrations prior to
Django 1.7 you had "South" which was the de facto standard and
"django-migrations" which was a more or less defunct project. Then you'd have
a ton of django-* packages which mention both Django and the fact that they
have migrations shipped within their long_description. It got somewhat hard
to get South to score fairly high on "django migrations"[1][2].

Popularity is a reasonable metric, if I recall Warehouse uses the download
counts of a project to weight the search results (although it should probably
use rolling counts and not total counts). The idea here is that something that
is downloaded more often is likely to be a better overall choice for most
people.

On Crate I had "favorites" which were functionally equivilant to stars on
GitHub which would influence the search results as well.

That's about all that I've been able to think of to glean information from,
any sort of ratings or what have you system needs to be done very carefully
so that it actually provides value and isn't just a pain point.


[1] https://pypi.python.org/pypi?%3Aaction=search&term=django+migrations&submit=search
[2] https://warehouse.python.org/search/project/?q=django+migrations

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA