Re: [Distutils] Outdated packages on pypi

19 Jul 2016

      On Jul 19, 2016 8:44 AM, "Nick Coghlan"  wrote:
...
On 19 July 2016 at 18:13, Wes Turner  wrote:
...
so, there's a need for specifying the {PyPI} package URI in setup.py
Not really - tools can make a reasonable guess about the source PyPI
URL based purely on the name and version. For non-PyPI hosted
packages, the extra piece of info needed is the index server URL.
So, the index server URL is in pip.conf or
.pydistutils.cfg or setup.cfg OR specified on the commandline?
...
...
and then generating meta.jsonld from setup.py
No, a JSON-LD generator would start with a rendered metadata format,
not the raw setup.py.
"pydist.json", my mistake

https://github.com/pypa/interoperability-peps/issues/31#issuecomment-1396572...
- pydist.json
- metadata.json (wheel)

- pydist.jsonld
...
...
and then generating JSONLD in a warehouse/pypa view; because that's
where
...
...
they keep the actual metadara (package platform versions, checksums,
potentially supersededBy redirects)
No, there is no requirement for this to be a PyPI feature. Absolutely
none.
...
and then a signing key for a) package maintainer-supplied metadata and
b)
package repository metadata (which is/would be redundant but comforting)
This is already covered (thoroughly) in PEPs 458 and 480, and has
nothing to do with metadata linking.
ld-signatures can be used to sign {RDF, JSONLD, RDFa}; and attach the
signature to the document.

https://web-payments.org/specs/source/ld-signatures/

- JWS only works with JSON formats (and not RDF)

https://www.python.org/dev/peps/pep-0480/

- Does this yet include signing potentially cached JSON metadata used by
actual tools like e.g. pip?
- How do you feel about redirects because superseded and nobody can
convince the maintainer to update the long_description?
...
...
and then third-party services like NVD, CVEdetails, and stack metadata
aggregation services
And this is the other reason why it doesn't make sense to do this on
PyPI itself - the publisher provided metadata from PyPI is only one
piece of the project metadata puzzle (issue trackers and source code
repositories are another one, as are the communication metrics
collected by the likes of Bitergia).
AFAIU, the extra load of fielding vulnerability reports for responsibly
PyPI-hosted packages is beyond the scope of the PyPI and Warehouse packages.
...
For a data aggregator, supporting multiple language ecosystems, and
multiple issue trackers, and multiple code hosting sites is an M+N+O
scale problem (where M is the number of language ecosystems supported,
etc). By contrast, if you try to solve this problem in the package
publication services for each individual language, you turn it into an
M*(N+O) scale problem, where you need to give each language-specific
service the ability to collect metadata from all those other sources.
Are you saying that, for release-monitoring.org (a service you are somehow
financially associated with), you have already invested the time to read
the existing PyPI metadata; but not eg the 'python' or 'python-dev' OS
package metadata?

Debian has an RDF endpoint.
- https://packages.qa.debian.org/p/python-defaults.html
-
https://packages.qa.debian.org/p/python-defaults.ttl
- But there's yet no easy way to JOIN metadata down the graph of
downstream OS packages to PyPI archives to source repository changesets;
not without RDF
and not without writing unnecessary language/packaging-community-specific
{INI,JSON,TOML,  YAMLLD  } parsers.

O-estimations aside,
when a data publisher publishes web standard data,
everyone can benefit;
because upper bound network effects N**2 (Metcalf's Law)
...
This means that since we don't have a vested interest in adding more
functionality to PyPI that doesn't specifically *need* to be there
(and in fact actively want to avoid doing so), we can say "Conformance
to semantic web standards is a problem for aggregation services like
libraries.io and release-monitoring.org to solve, not for us to
incorporate directly into PyPI".
A view producing JSONLD.

Probably right about here:
https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py

Because there are a few (possibly backwards compatible) changes that could
be made here so that we could just add @context to the existing JSON record
(thus making it JSONLD, which anyone can read and index without a
domain-specific parser):
https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py

IIRC:
https://github.com/pypa/interoperability-peps/issues/31#issuecomment-2331955...
...
...
sorry to hijack the thread;  i hear "more links and metadata in an
auxilliary schema" and think 'RDF is the semantic web solution for this
graph problem'
I know, and you're not wrong about that. Where you're running into
trouble is that you're trying to insist that it is the responsibility
of the initial data *publishers* to conform to the semantic web
standards, and it *isn't* - that job is one for the data aggregators
that have an interest in making it easier for people to work across
multiple data sets managed by different groups of people.
No, after-the-fact transformation is wasteful and late.

A bit of advice for data publishers:
http://5stardata.info/en/
...
For publication platforms managing a single subgraph, native support
for JSON-LD and RDFa introduces unwanted complexity by expanding the
data model to incorporate all of the relational concepts defined in
those standards. Well funded platforms may have the development
capacity to spare to spend time on such activities, but PyPI isn't
such a platform.
This is Warehouse:
https://github.com/pypa/warehouse

It is maintainable.

https://www.pypa.io/en/latest/help/
...
By contrast, for aggregators managing a graph-of-graphs problem,
JSON-LD and RDFa introduce normalisation across data sets that
*reduces* overall complexity, since most of the details of the
subgraphs can be ignored, as you focus instead on the links between
the entities they contain.
Cheers,
Nick.
--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia