[Distutils] does pypi or red-dove have a better firehose API than "download all the packages"?
dw at botanicus.net
Fri May 17 03:55:39 CEST 2013
Interesting! I produced that dump as part of a demo of using Xapian for
cheese shop search (still a work in progress, when I get a free moment).
Adding e.g. a "depends:" operator is something I'd like, and your database
sounds very useful for achieving that goal.
Thanks for the link. I may be e-mailing you shortly ;)
On 17 May 2013 02:50, Daniel Holth <dholth at gmail.com> wrote:
> On Thu, May 16, 2013 at 3:46 PM, David Wilson <dw at botanicus.net> wrote:
> > Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be
> > useful to you? (warning: 57mb expanding to 540mb). Each line is a
> > JSON-encoded dict containing a single package release.
> > for line in gzip.open('dump.txt.gz'):
> > dct = json.loads(line)
> > ....
> > etc
> > The code for it is very simple, would be willing to clean it up and turn
> > into a cron job if people found it useful.
> > Note the dump above is outdated, I only made it as a test.
> Seems like a useful format.
> https://bitbucket.org/dholth/pypi_stats is a prototype that parses
> requires.txt and other metadata out of all the sdists in a folder,
> putting them into a sqlite3 database. It may be interesting for
> experimentation. For example, I can easily tell you how many different
> version numbers there are and which are the most popular, or I can
> tell you which metadata keys and version numbers have been used. The
> database winds up being 1.6 GB or about 200MB if you delete the
> unparsed files.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Distutils-SIG