<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Jul 22, 2015 at 9:12 PM Wes Turner <<a href="mailto:wes.turner@gmail.com">wes.turner@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr"><br>
On Jul 22, 2015 5:12 PM, "Brett Cannon" <<a href="mailto:bcannon@gmail.com" target="_blank">bcannon@gmail.com</a>> wrote:<br>
><br>
><br>
><br>
> On Wed, Jul 22, 2015 at 2:19 PM Wes Turner <<a href="mailto:wes.turner@gmail.com" target="_blank">wes.turner@gmail.com</a>> wrote:<br>
>><br>
>> <a href="https://github.com/dstufft/pypi-stats" target="_blank">https://github.com/dstufft/pypi-stats</a><br>
>><br>
>> <a href="https://github.com/dstufft/pypi-external-stats" target="_blank">https://github.com/dstufft/pypi-external-stats</a><br>
><br>
><br>
> I'm not quite sure what I'm supposed to get from those links, Wes, as that code still scrapes every project individually and downloads them while all I'm trying to avoid having to scrape PyPI and instead just download a single file (plus I don't want the files but just the metadata already returned by the JSON API).</p>
<p dir="ltr">An online query or an offline dump?</p></blockquote><div><br></div><div>Offline dump. I literally just want a single file to download.</div><div><br></div><div>Anyway, it's sounding like there isn't one currently so it would need to be a new feature for Warehouse.</div><div><br></div><div>-Brett</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<p dir="ltr">><br>
> -Brett<br>
>  <br>
>><br>
>> - [ ] a flat bigquery w/ pandas.io.gbq ala GitHub Archive would be great</p>
<p dir="ltr"><a href="http://pandas.pydata.org/pandas-docs/version/0.16.2/io.html#io-bigquery" target="_blank">http://pandas.pydata.org/pandas-docs/version/0.16.2/io.html#io-bigquery</a></p>
<p dir="ltr">>> - [ ] it's probably worth it to add RDFa to PyPi and warehouse pages (in addition to the auxiliary executed/extracted JSON) for #search</p>
<p dir="ltr"><a href="https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py" target="_blank">https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py</a></p>
<p dir="ltr"><a href="https://github.com/pypa/warehouse/blob/master/tests/unit/packaging/test_models.py" target="_blank">https://github.com/pypa/warehouse/blob/master/tests/unit/packaging/test_models.py</a></p>
<p dir="ltr"><a href="https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py" target="_blank">https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py</a></p>
<p dir="ltr"><a href="https://github.com/pypa/warehouse/blob/master/warehouse/templates/packaging/detail.html" target="_blank">https://github.com/pypa/warehouse/blob/master/warehouse/templates/packaging/detail.html</a></p>
<p dir="ltr"><a href="https://github.com/pypa/warehouse/blob/master/warehouse/routes.py" target="_blank">https://github.com/pypa/warehouse/blob/master/warehouse/routes.py</a></p>
<p dir="ltr"><a href="https://github.com/pypa/warehouse/blob/master/tests/unit/legacy/api/test_json.py" target="_blank">https://github.com/pypa/warehouse/blob/master/tests/unit/legacy/api/test_json.py</a></p>
<p dir="ltr"><a href="https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py" target="_blank">https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py</a></p>
<p dir="ltr">>><br>
>> On Jul 22, 2015 4:08 PM, "Brett Cannon" <<a href="mailto:bcannon@gmail.com" target="_blank">bcannon@gmail.com</a>> wrote:<br>
>>><br>
>>> When I wrote <a href="https://nothingbutsnark.svbtle.com/python-3-support-on-pypi" target="_blank">https://nothingbutsnark.svbtle.com/python-3-support-on-pypi</a> I wrote a script to download every project's JSON metadata by scraping the simple index and then making the appropriate GET request for the JSON metadata. It worked, but somewhat of a hassle.<br>
>>><br>
>>> Is there some dump somewhere that is built daily, weekly, or monthly of all the metadata on PyPI for offline analysis?<br>
>>><br>
>>> _______________________________________________<br>
>>> Distutils-SIG maillist  -  <a href="mailto:Distutils-SIG@python.org" target="_blank">Distutils-SIG@python.org</a><br>
>>> <a href="https://mail.python.org/mailman/listinfo/distutils-sig" target="_blank">https://mail.python.org/mailman/listinfo/distutils-sig</a><br>
>>><br>
</p>
</blockquote></div></div>