[Distutils] Outdated packages on pypi

Nick Coghlan ncoghlan at gmail.com
Tue Jul 19 08:44:23 EDT 2016


On 19 July 2016 at 18:13, Wes Turner <wes.turner at gmail.com> wrote:
> so, there's a need for specifying the {PyPI} package URI in setup.py

Not really - tools can make a reasonable guess about the source PyPI
URL based purely on the name and version. For non-PyPI hosted
packages, the extra piece of info needed is the index server URL.

> and then generating meta.jsonld from setup.py

No, a JSON-LD generator would start with a rendered metadata format,
not the raw setup.py.

> and then generating JSONLD in a warehouse/pypa view; because that's where
> they keep the actual metadara (package platform versions, checksums,
> potentially supersededBy redirects)

No, there is no requirement for this to be a PyPI feature. Absolutely none.

> and then a signing key for a) package maintainer-supplied metadata and b)
> package repository metadata (which is/would be redundant but comforting)

This is already covered (thoroughly) in PEPs 458 and 480, and has
nothing to do with metadata linking.

> and then third-party services like NVD, CVEdetails, and stack metadata
> aggregation services

And this is the other reason why it doesn't make sense to do this on
PyPI itself - the publisher provided metadata from PyPI is only one
piece of the project metadata puzzle (issue trackers and source code
repositories are another one, as are the communication metrics
collected by the likes of Bitergia).

For a data aggregator, supporting multiple language ecosystems, and
multiple issue trackers, and multiple code hosting sites is an M+N+O
scale problem (where M is the number of language ecosystems supported,
etc). By contrast, if you try to solve this problem in the package
publication services for each individual language, you turn it into an
M*(N+O) scale problem, where you need to give each language-specific
service the ability to collect metadata from all those other sources.

This means that since we don't have a vested interest in adding more
functionality to PyPI that doesn't specifically *need* to be there
(and in fact actively want to avoid doing so), we can say "Conformance
to semantic web standards is a problem for aggregation services like
libraries.io and release-monitoring.org to solve, not for us to
incorporate directly into PyPI".

> sorry to hijack the thread;  i hear "more links and metadata in an
> auxilliary schema" and think 'RDF is the semantic web solution for this
> graph problem'

I know, and you're not wrong about that. Where you're running into
trouble is that you're trying to insist that it is the responsibility
of the initial data *publishers* to conform to the semantic web
standards, and it *isn't* - that job is one for the data aggregators
that have an interest in making it easier for people to work across
multiple data sets managed by different groups of people.

For publication platforms managing a single subgraph, native support
for JSON-LD and RDFa introduces unwanted complexity by expanding the
data model to incorporate all of the relational concepts defined in
those standards. Well funded platforms may have the development
capacity to spare to spend time on such activities, but PyPI isn't
such a platform.

By contrast, for aggregators managing a graph-of-graphs problem,
JSON-LD and RDFa introduce normalisation across data sets that
*reduces* overall complexity, since most of the details of the
subgraphs can be ignored, as you focus instead on the links between
the entities they contain.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list