On Mar 5, 2017, at 8:51 AM, Donald Stufft <donald@stufft.io> wrote:
So, as most folks are aware PyPI has long had a cumulative download count available in it’s API. This has been on and off again broken for a *long* time and arguably the numbers in there have been “wrong” even when it was working because we had no way to reproduce them from scratch (and thus whenever a bug occurred we’d flat out lose data or add incorrect data with no way to correct it).
In the meantime, we’ve gotten a much better source of querying for download information available inside of Google’s BigQuery database [1][2]. Not only is this able to be recreated “from scratch” so we can, if needed, fix massive data bugs but it provides MUCH more information than the previous downloads and a very powerful query language to go along with it.
Unless there is some sort of massive outcry, I plan to deprecate and ultimately remove the download counts available in the PyPI API, instead preferring people to start using the BigQuery data instead. This more or less reflects the current state of things, since it has been on and off broken (typically broken) for something like a year now.
I fully realize that if I really wanted this, I could do it myself, and the last thing you need is someone signing you up for more work :). But, as someone who's been vaguely annoyed that `vanity` doesn't work for a while, I wonder: shouldn't it be easy for someone familiar with both systems to simply implement the existing "download count" API as a legacy / compatibility wrapper around BigQuery? If that isn't trivial, doesn't that point to something flawed in the way the data is presented in BigQuery? That said, I'm fully OK with the answer that even a tiny bit of work is too much, and the limited volunteer effort of PyPI should be spent elsewhere. -glyph