[Python-ideas] PyPI JSON Metadata Standardization for Mirrors

Wes Turner wes.turner at gmail.com
Thu Aug 10 22:46:05 EDT 2017


On Wednesday, August 9, 2017, Cooper Ry Lees <lists at cooperlees.com> wrote:

> Hi all,
>
> First time emailer, so please be kind. Also, if this is not the right
> mailing list for PyPA talk, I apologize. Please point me in the right
> direction if so.
>

Here are some notes re: changing metadata:

https://github.com/pypa/interoperability-peps/issues/31

https://www.google.com/search?q=pep426jsonld

Towards JSONLD is the best approach, I think. So, that means it would be
best to, if you need to add additional metadata (?) and must key things,
also copy the key into an object:

    {"thing1": {"@id": "thing1", "url": "..."}}

Instead of just:

    {"thing1": {"url": "..."}}

https://github.com/pypa/interoperability-peps/issues/31#issuecomment-233195564


>  The main reason I have emailed here is I believe it may be PEP time to
> standardize the JSON metadata that PyPI makes available, like what was done
> for the `'simple API` described in PEP503.
>
> I've been doing a bit of work on `bandersnatch` (I didn't name it), which
> is a PEP 381 mirroring package and wanted to enhance it to also mirror the
> handy JSON metadata PyPI generates and makes available @
> https://pypi.python.org/pypi/PKG_NAME/json.
>
> I've done a PR on bandersnatch as a POC that mirrors both the PyPI
> directory structure (URL/pypi/PKG_NAME/json) and created a standardizable
> URL/json/PKG_NAME that the former symlinks to (to be served by NGINX / some
> other proxy). I'm also contemplating naming the directory 'metadata' rather
> than JSON so if some new hotness / we want to change the format down the
> line we're not stuck with json as the dirname. This PR can be found here:
> https://bitbucket.org/pypa/bandersnatch/pull-requests/33/save-json-
> metadata-to-mirror
>
> My main use case is to write a very simple async 'verifier' tool that will
> crawl all the JSON files and then ensure the packages directory on each of
> my internal mirrors (I have a mirror per region / datacenter) have all the
> files they should. I sync centrally (to save resource on the PyPI
> infrastructure) and then rsync out all the diffs to each region /
> datacenter, and under some failure scenarios I could miss a file or many.
> So I feel using JSON pulled down from the authoritative source will allow
> an async job to verify the MD5 of all the package files on each mirror.
>
> What are peoples thoughts here? Is it worth a PEP similar to PEP503 going
> forward? Can people enhance / share some thoughts on this idea.
>

Here are some notes on making this more efficient:

"Add API endpoint to get latest version of all projects"
https://github.com/pypa/warehouse/issues/347


... To http://markmail.org/search/?q=list:org.python.distutils-sig .


>
> Thanks,
> Cooper Lees
> me at cooperlees.com <javascript:_e(%7B%7D,'cvml','me at copperlees.com');>
> https://cooperlees.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170810/9ababd55/attachment-0001.html>


More information about the Python-ideas mailing list