[Distutils] dump of all PyPI project metadata available?

Brett Cannon bcannon at gmail.com
Thu Jul 23 16:30:31 CEST 2015


On Wed, Jul 22, 2015 at 9:12 PM Wes Turner <wes.turner at gmail.com> wrote:

>
> On Jul 22, 2015 5:12 PM, "Brett Cannon" <bcannon at gmail.com> wrote:
> >
> >
> >
> > On Wed, Jul 22, 2015 at 2:19 PM Wes Turner <wes.turner at gmail.com> wrote:
> >>
> >> https://github.com/dstufft/pypi-stats
> >>
> >> https://github.com/dstufft/pypi-external-stats
> >
> >
> > I'm not quite sure what I'm supposed to get from those links, Wes, as
> that code still scrapes every project individually and downloads them while
> all I'm trying to avoid having to scrape PyPI and instead just download a
> single file (plus I don't want the files but just the metadata already
> returned by the JSON API).
>
> An online query or an offline dump?
>

Offline dump. I literally just want a single file to download.

Anyway, it's sounding like there isn't one currently so it would need to be
a new feature for Warehouse.

-Brett


> >
> > -Brett
> >
> >>
> >> - [ ] a flat bigquery w/ pandas.io.gbq ala GitHub Archive would be great
>
> http://pandas.pydata.org/pandas-docs/version/0.16.2/io.html#io-bigquery
>
> >> - [ ] it's probably worth it to add RDFa to PyPi and warehouse pages
> (in addition to the auxiliary executed/extracted JSON) for #search
>
> https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py
>
>
> https://github.com/pypa/warehouse/blob/master/tests/unit/packaging/test_models.py
>
> https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py
>
>
> https://github.com/pypa/warehouse/blob/master/warehouse/templates/packaging/detail.html
>
> https://github.com/pypa/warehouse/blob/master/warehouse/routes.py
>
>
> https://github.com/pypa/warehouse/blob/master/tests/unit/legacy/api/test_json.py
>
> https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py
>
> >>
> >> On Jul 22, 2015 4:08 PM, "Brett Cannon" <bcannon at gmail.com> wrote:
> >>>
> >>> When I wrote
> https://nothingbutsnark.svbtle.com/python-3-support-on-pypi I wrote a
> script to download every project's JSON metadata by scraping the simple
> index and then making the appropriate GET request for the JSON metadata. It
> worked, but somewhat of a hassle.
> >>>
> >>> Is there some dump somewhere that is built daily, weekly, or monthly
> of all the metadata on PyPI for offline analysis?
> >>>
> >>> _______________________________________________
> >>> Distutils-SIG maillist  -  Distutils-SIG at python.org
> >>> https://mail.python.org/mailman/listinfo/distutils-sig
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150723/4c9ba11e/attachment.html>


More information about the Distutils-SIG mailing list