[Distutils] does pypi or red-dove have a better firehose API than "download all the packages"?

David Wilson dw at botanicus.net
Fri May 17 03:55:39 CEST 2013


Interesting! I produced that dump as part of a demo of using Xapian for
cheese shop search (still a work in progress, when I get a free moment).
Adding e.g. a  "depends:" operator is something I'd like, and your database
sounds very useful for achieving that goal.

Thanks for the link. I may be e-mailing you shortly ;)


On 17 May 2013 02:50, Daniel Holth <dholth at gmail.com> wrote:

> On Thu, May 16, 2013 at 3:46 PM, David Wilson <dw at botanicus.net> wrote:
> > Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be
> > useful to you? (warning: 57mb expanding to 540mb). Each line is a
> > JSON-encoded dict containing a single package release.
> >
> > for line in gzip.open('dump.txt.gz'):
> >     dct = json.loads(line)
> >     ....
> >
> > etc
> >
> > The code for it is very simple, would be willing to clean it up and turn
> it
> > into a cron job if people found it useful.
> >
> > Note the dump above is outdated, I only made it as a test.
>
> Seems like a useful format.
>
> https://bitbucket.org/dholth/pypi_stats is a prototype that parses
> requires.txt and other metadata out of all the sdists in a folder,
> putting them into a sqlite3 database. It may be interesting for
> experimentation. For example, I can easily tell you how many different
> version numbers there are and which are the most popular, or I can
> tell you which metadata keys and version numbers have been used. The
> database winds up being 1.6 GB or about 200MB if you delete the
> unparsed files.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130517/15c9f8c6/attachment.html>


More information about the Distutils-SIG mailing list