[Python-ideas] PyPI JSON Metadata Standardization for Mirrors
Cooper Ry Lees
lists at cooperlees.com
Thu Aug 10 00:21:56 EDT 2017
First time emailer, so please be kind. Also, if this is not the right
mailing list for PyPA talk, I apologize. Please point me in the right
direction if so. The main reason I have emailed here is I believe it may be
PEP time to standardize the JSON metadata that PyPI makes available, like
what was done for the `'simple API` described in PEP503.
I've been doing a bit of work on `bandersnatch` (I didn't name it), which
is a PEP 381 mirroring package and wanted to enhance it to also mirror the
handy JSON metadata PyPI generates and makes available @
I've done a PR on bandersnatch as a POC that mirrors both the PyPI
directory structure (URL/pypi/PKG_NAME/json) and created a standardizable
URL/json/PKG_NAME that the former symlinks to (to be served by NGINX / some
other proxy). I'm also contemplating naming the directory 'metadata' rather
than JSON so if some new hotness / we want to change the format down the
line we're not stuck with json as the dirname. This PR can be found here:
My main use case is to write a very simple async 'verifier' tool that will
crawl all the JSON files and then ensure the packages directory on each of
my internal mirrors (I have a mirror per region / datacenter) have all the
files they should. I sync centrally (to save resource on the PyPI
infrastructure) and then rsync out all the diffs to each region /
datacenter, and under some failure scenarios I could miss a file or many.
So I feel using JSON pulled down from the authoritative source will allow
an async job to verify the MD5 of all the package files on each mirror.
What are peoples thoughts here? Is it worth a PEP similar to PEP503 going
forward? Can people enhance / share some thoughts on this idea.
me at cooperlees.com <me at copperlees.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-ideas