[Distutils] does pypi or red-dove have a better firehose API than "download all the packages"?

Daniel Holth dholth at gmail.com
Fri May 17 03:50:17 CEST 2013


On Thu, May 16, 2013 at 3:46 PM, David Wilson <dw at botanicus.net> wrote:
> Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be
> useful to you? (warning: 57mb expanding to 540mb). Each line is a
> JSON-encoded dict containing a single package release.
>
> for line in gzip.open('dump.txt.gz'):
>     dct = json.loads(line)
>     ....
>
> etc
>
> The code for it is very simple, would be willing to clean it up and turn it
> into a cron job if people found it useful.
>
> Note the dump above is outdated, I only made it as a test.

Seems like a useful format.

https://bitbucket.org/dholth/pypi_stats is a prototype that parses
requires.txt and other metadata out of all the sdists in a folder,
putting them into a sqlite3 database. It may be interesting for
experimentation. For example, I can easily tell you how many different
version numbers there are and which are the most popular, or I can
tell you which metadata keys and version numbers have been used. The
database winds up being 1.6 GB or about 200MB if you delete the
unparsed files.


More information about the Distutils-SIG mailing list