[Distutils] FYI - "Trending" on Warehouse

Donald Stufft donald at stufft.io
Mon Mar 13 11:23:25 EDT 2017


Just an FYI, I’ve replaced the long stagnation “top downloads” on the Warehouse / pypi.org <http://pypi.org/> homepage with “Trending” projects. Since “trending” can mean a lot of different things as far as how it’s computed, here’s how I’m currently doing it [1]:

Using a look back over the last 30 days of downloads I compute a “zscore” for each project for yesterday (effectively, how many standard deviations away from from the mean yesterday was for that project in total downloads). The trending projects is then the top 5 projects in terms of zscore for yesterday (recomputed every day at ~3am UTC). Because it’s a lot easier for a project with an average of 5 downloads to jump to 100 than it is for a project with 50000 downloads jump to 1000000 I have tried to exclude any projects with very few downloads from this, so in order to qualify to be trending a project must receive at least 5,000 downloads in a month.

If you happen to be some sort of sciencey person and you know of a better way to query what is effectively a table with a row for every download for every project to determine which ones are trending, feel free to open an issue or create a PR or something. I don’t really know what I’m doing here :)

Anyways, that’s all! 



[1] https://github.com/pypa/warehouse/blob/a36435b9865000cdaae97b948af48c33f7d8fe8e/warehouse/packaging/tasks.py#L19-L102 <https://github.com/pypa/warehouse/blob/a36435b9865000cdaae97b948af48c33f7d8fe8e/warehouse/packaging/tasks.py#L19-L102>

—
Donald Stufft



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170313/d1e2fde4/attachment.html>


More information about the Distutils-SIG mailing list