On Jul 19, 2016 8:44 AM, "Nick Coghlan"
On 19 July 2016 at 18:13, Wes Turner
wrote: so, there's a need for specifying the {PyPI} package URI in setup.py
Not really - tools can make a reasonable guess about the source PyPI URL based purely on the name and version. For non-PyPI hosted packages, the extra piece of info needed is the index server URL.
So, the index server URL is in pip.conf or .pydistutils.cfg or setup.cfg OR specified on the commandline?
and then generating meta.jsonld from setup.py
No, a JSON-LD generator would start with a rendered metadata format, not the raw setup.py.
"pydist.json", my mistake https://github.com/pypa/interoperability-peps/issues/31#issuecomment-1396572... - pydist.json - metadata.json (wheel) - pydist.jsonld
and then generating JSONLD in a warehouse/pypa view; because that's
where
they keep the actual metadara (package platform versions, checksums, potentially supersededBy redirects)
No, there is no requirement for this to be a PyPI feature. Absolutely none.
and then a signing key for a) package maintainer-supplied metadata and b) package repository metadata (which is/would be redundant but comforting)
This is already covered (thoroughly) in PEPs 458 and 480, and has nothing to do with metadata linking.
ld-signatures can be used to sign {RDF, JSONLD, RDFa}; and attach the signature to the document. https://web-payments.org/specs/source/ld-signatures/ - JWS only works with JSON formats (and not RDF) https://www.python.org/dev/peps/pep-0480/ - Does this yet include signing potentially cached JSON metadata used by actual tools like e.g. pip? - How do you feel about redirects because superseded and nobody can convince the maintainer to update the long_description?
and then third-party services like NVD, CVEdetails, and stack metadata aggregation services
And this is the other reason why it doesn't make sense to do this on PyPI itself - the publisher provided metadata from PyPI is only one piece of the project metadata puzzle (issue trackers and source code repositories are another one, as are the communication metrics collected by the likes of Bitergia).
AFAIU, the extra load of fielding vulnerability reports for responsibly PyPI-hosted packages is beyond the scope of the PyPI and Warehouse packages.
For a data aggregator, supporting multiple language ecosystems, and multiple issue trackers, and multiple code hosting sites is an M+N+O scale problem (where M is the number of language ecosystems supported, etc). By contrast, if you try to solve this problem in the package publication services for each individual language, you turn it into an M*(N+O) scale problem, where you need to give each language-specific service the ability to collect metadata from all those other sources.
Are you saying that, for release-monitoring.org (a service you are somehow financially associated with), you have already invested the time to read the existing PyPI metadata; but not eg the 'python' or 'python-dev' OS package metadata? Debian has an RDF endpoint. - https://packages.qa.debian.org/p/python-defaults.html - https://packages.qa.debian.org/p/python-defaults.ttl - But there's yet no easy way to JOIN metadata down the graph of downstream OS packages to PyPI archives to source repository changesets; not without RDF and not without writing unnecessary language/packaging-community-specific {INI,JSON,TOML, YAMLLD } parsers. O-estimations aside, when a data publisher publishes web standard data, everyone can benefit; because upper bound network effects N**2 (Metcalf's Law)
This means that since we don't have a vested interest in adding more functionality to PyPI that doesn't specifically *need* to be there (and in fact actively want to avoid doing so), we can say "Conformance to semantic web standards is a problem for aggregation services like libraries.io and release-monitoring.org to solve, not for us to incorporate directly into PyPI".
A view producing JSONLD. Probably right about here: https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py Because there are a few (possibly backwards compatible) changes that could be made here so that we could just add @context to the existing JSON record (thus making it JSONLD, which anyone can read and index without a domain-specific parser): https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py IIRC: https://github.com/pypa/interoperability-peps/issues/31#issuecomment-2331955...
sorry to hijack the thread; i hear "more links and metadata in an auxilliary schema" and think 'RDF is the semantic web solution for this graph problem'
I know, and you're not wrong about that. Where you're running into trouble is that you're trying to insist that it is the responsibility of the initial data *publishers* to conform to the semantic web standards, and it *isn't* - that job is one for the data aggregators that have an interest in making it easier for people to work across multiple data sets managed by different groups of people.
No, after-the-fact transformation is wasteful and late. A bit of advice for data publishers: http://5stardata.info/en/
For publication platforms managing a single subgraph, native support for JSON-LD and RDFa introduces unwanted complexity by expanding the data model to incorporate all of the relational concepts defined in those standards. Well funded platforms may have the development capacity to spare to spend time on such activities, but PyPI isn't such a platform.
This is Warehouse: https://github.com/pypa/warehouse It is maintainable. https://www.pypa.io/en/latest/help/
By contrast, for aggregators managing a graph-of-graphs problem, JSON-LD and RDFa introduce normalisation across data sets that *reduces* overall complexity, since most of the details of the subgraphs can be ignored, as you focus instead on the links between the entities they contain.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia