PyPI JSON Metadata Standardization for Mirrors

Hi all, First time emailer, so please be kind. Also, if this is not the right mailing list for PyPA talk, I apologize. Please point me in the right direction if so. The main reason I have emailed here is I believe it may be PEP time to standardize the JSON metadata that PyPI makes available, like what was done for the `'simple API` described in PEP503. I've been doing a bit of work on `bandersnatch` (I didn't name it), which is a PEP 381 mirroring package and wanted to enhance it to also mirror the handy JSON metadata PyPI generates and makes available @ https://pypi.python.org/pypi/PKG_NAME/json. I've done a PR on bandersnatch as a POC that mirrors both the PyPI directory structure (URL/pypi/PKG_NAME/json) and created a standardizable URL/json/PKG_NAME that the former symlinks to (to be served by NGINX / some other proxy). I'm also contemplating naming the directory 'metadata' rather than JSON so if some new hotness / we want to change the format down the line we're not stuck with json as the dirname. This PR can be found here: https://bitbucket.org/pypa/bandersnatch/pull-requests/33/save-json-metadata-... My main use case is to write a very simple async 'verifier' tool that will crawl all the JSON files and then ensure the packages directory on each of my internal mirrors (I have a mirror per region / datacenter) have all the files they should. I sync centrally (to save resource on the PyPI infrastructure) and then rsync out all the diffs to each region / datacenter, and under some failure scenarios I could miss a file or many. So I feel using JSON pulled down from the authoritative source will allow an async job to verify the MD5 of all the package files on each mirror. What are peoples thoughts here? Is it worth a PEP similar to PEP503 going forward? Can people enhance / share some thoughts on this idea. Thanks, Cooper Lees me@cooperlees.com <me@copperlees.com> https://cooperlees.com/

On Wednesday, August 9, 2017, Cooper Ry Lees <lists@cooperlees.com> wrote:
Here are some notes re: changing metadata: https://github.com/pypa/interoperability-peps/issues/31 https://www.google.com/search?q=pep426jsonld Towards JSONLD is the best approach, I think. So, that means it would be best to, if you need to add additional metadata (?) and must key things, also copy the key into an object: {"thing1": {"@id": "thing1", "url": "..."}} Instead of just: {"thing1": {"url": "..."}} https://github.com/pypa/interoperability-peps/issues/31#issuecomment-2331955...
Here are some notes on making this more efficient: "Add API endpoint to get latest version of all projects" https://github.com/pypa/warehouse/issues/347 ... To http://markmail.org/search/?q=list:org.python.distutils-sig .

On Wednesday, August 9, 2017, Cooper Ry Lees <lists@cooperlees.com> wrote:
Here are some notes re: changing metadata: https://github.com/pypa/interoperability-peps/issues/31 https://www.google.com/search?q=pep426jsonld Towards JSONLD is the best approach, I think. So, that means it would be best to, if you need to add additional metadata (?) and must key things, also copy the key into an object: {"thing1": {"@id": "thing1", "url": "..."}} Instead of just: {"thing1": {"url": "..."}} https://github.com/pypa/interoperability-peps/issues/31#issuecomment-2331955...
Here are some notes on making this more efficient: "Add API endpoint to get latest version of all projects" https://github.com/pypa/warehouse/issues/347 ... To http://markmail.org/search/?q=list:org.python.distutils-sig .
participants (3)
-
Brett Cannon
-
Cooper Ry Lees
-
Wes Turner