<div dir="ltr"><div>Interesting! I produced that dump as part of a demo of using Xapian for cheese shop search (still a work in progress, when I get a free moment). Adding e.g. a "depends:" operator is something I'd like, and your database sounds very useful for achieving that goal.<br>
<br></div>Thanks for the link. I may be e-mailing you shortly ;)<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 17 May 2013 02:50, Daniel Holth <span dir="ltr"><<a href="mailto:dholth@gmail.com" target="_blank">dholth@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Thu, May 16, 2013 at 3:46 PM, David Wilson <<a href="mailto:dw@botanicus.net">dw@botanicus.net</a>> wrote:<br>
> Would something like <a href="http://pypi.h1.botanicus.net/static/dump.txt.gz" target="_blank">http://pypi.h1.botanicus.net/static/dump.txt.gz</a> be<br>
> useful to you? (warning: 57mb expanding to 540mb). Each line is a<br>
> JSON-encoded dict containing a single package release.<br>
><br>
> for line in gzip.open('dump.txt.gz'):<br>
> dct = json.loads(line)<br>
> ....<br>
><br>
> etc<br>
><br>
> The code for it is very simple, would be willing to clean it up and turn it<br>
> into a cron job if people found it useful.<br>
><br>
> Note the dump above is outdated, I only made it as a test.<br>
<br>
</div>Seems like a useful format.<br>
<br>
<a href="https://bitbucket.org/dholth/pypi_stats" target="_blank">https://bitbucket.org/dholth/pypi_stats</a> is a prototype that parses<br>
requires.txt and other metadata out of all the sdists in a folder,<br>
putting them into a sqlite3 database. It may be interesting for<br>
experimentation. For example, I can easily tell you how many different<br>
version numbers there are and which are the most popular, or I can<br>
tell you which metadata keys and version numbers have been used. The<br>
database winds up being 1.6 GB or about 200MB if you delete the<br>
unparsed files.<br>
</blockquote></div><br></div>