[Distutils] Outdated packages on pypi

Wes Turner wes.turner at gmail.com
Tue Jul 19 11:41:25 EDT 2016


On Jul 19, 2016 8:44 AM, "Nick Coghlan" <ncoghlan at gmail.com> wrote:
>
> On 19 July 2016 at 18:13, Wes Turner <wes.turner at gmail.com> wrote:
> > so, there's a need for specifying the {PyPI} package URI in setup.py
>
> Not really - tools can make a reasonable guess about the source PyPI
> URL based purely on the name and version. For non-PyPI hosted
> packages, the extra piece of info needed is the index server URL.

So, the index server URL is in pip.conf or
.pydistutils.cfg or setup.cfg OR specified on the commandline?

>
> > and then generating meta.jsonld from setup.py
>
> No, a JSON-LD generator would start with a rendered metadata format,
> not the raw setup.py.

"pydist.json", my mistake

https://github.com/pypa/interoperability-peps/issues/31#issuecomment-139657247
- pydist.json
- metadata.json (wheel)

- pydist.jsonld

>
> > and then generating JSONLD in a warehouse/pypa view; because that's
where
> > they keep the actual metadara (package platform versions, checksums,
> > potentially supersededBy redirects)
>
> No, there is no requirement for this to be a PyPI feature. Absolutely
none.
>
> > and then a signing key for a) package maintainer-supplied metadata and
b)
> > package repository metadata (which is/would be redundant but comforting)
>
> This is already covered (thoroughly) in PEPs 458 and 480, and has
> nothing to do with metadata linking.

ld-signatures can be used to sign {RDF, JSONLD, RDFa}; and attach the
signature to the document.

https://web-payments.org/specs/source/ld-signatures/

- JWS only works with JSON formats (and not RDF)

https://www.python.org/dev/peps/pep-0480/

- Does this yet include signing potentially cached JSON metadata used by
actual tools like e.g. pip?
- How do you feel about redirects because superseded and nobody can
convince the maintainer to update the long_description?

>
> > and then third-party services like NVD, CVEdetails, and stack metadata
> > aggregation services
>
> And this is the other reason why it doesn't make sense to do this on
> PyPI itself - the publisher provided metadata from PyPI is only one
> piece of the project metadata puzzle (issue trackers and source code
> repositories are another one, as are the communication metrics
> collected by the likes of Bitergia).

AFAIU, the extra load of fielding vulnerability reports for responsibly
PyPI-hosted packages is beyond the scope of the PyPI and Warehouse packages.

>
> For a data aggregator, supporting multiple language ecosystems, and
> multiple issue trackers, and multiple code hosting sites is an M+N+O
> scale problem (where M is the number of language ecosystems supported,
> etc). By contrast, if you try to solve this problem in the package
> publication services for each individual language, you turn it into an
> M*(N+O) scale problem, where you need to give each language-specific
> service the ability to collect metadata from all those other sources.

Are you saying that, for release-monitoring.org (a service you are somehow
financially associated with), you have already invested the time to read
the existing PyPI metadata; but not eg the 'python' or 'python-dev' OS
package metadata?

Debian has an RDF endpoint.
- https://packages.qa.debian.org/p/python-defaults.html
-
https://packages.qa.debian.org/p/python-defaults.ttl
- But there's yet no easy way to JOIN metadata down the graph of
downstream OS packages to PyPI archives to source repository changesets;
not without RDF
and not without writing unnecessary language/packaging-community-specific
{INI,JSON,TOML,  YAMLLD  } parsers.

O-estimations aside,
when a data publisher publishes web standard data,
everyone can benefit;
because upper bound network effects N**2 (Metcalf's Law)

>
> This means that since we don't have a vested interest in adding more
> functionality to PyPI that doesn't specifically *need* to be there
> (and in fact actively want to avoid doing so), we can say "Conformance
> to semantic web standards is a problem for aggregation services like
> libraries.io and release-monitoring.org to solve, not for us to
> incorporate directly into PyPI".

A view producing JSONLD.

Probably right about here:
https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py

Because there are a few (possibly backwards compatible) changes that could
be made here so that we could just add @context to the existing JSON record
(thus making it JSONLD, which anyone can read and index without a
domain-specific parser):
https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py

IIRC:
https://github.com/pypa/interoperability-peps/issues/31#issuecomment-233195564

>
> > sorry to hijack the thread;  i hear "more links and metadata in an
> > auxilliary schema" and think 'RDF is the semantic web solution for this
> > graph problem'
>
> I know, and you're not wrong about that. Where you're running into
> trouble is that you're trying to insist that it is the responsibility
> of the initial data *publishers* to conform to the semantic web
> standards, and it *isn't* - that job is one for the data aggregators
> that have an interest in making it easier for people to work across
> multiple data sets managed by different groups of people.

No, after-the-fact transformation is wasteful and late.

A bit of advice for data publishers:
http://5stardata.info/en/

>
> For publication platforms managing a single subgraph, native support
> for JSON-LD and RDFa introduces unwanted complexity by expanding the
> data model to incorporate all of the relational concepts defined in
> those standards. Well funded platforms may have the development
> capacity to spare to spend time on such activities, but PyPI isn't
> such a platform.

This is Warehouse:
https://github.com/pypa/warehouse

It is maintainable.

https://www.pypa.io/en/latest/help/

>
> By contrast, for aggregators managing a graph-of-graphs problem,
> JSON-LD and RDFa introduce normalisation across data sets that
> *reduces* overall complexity, since most of the details of the
> subgraphs can be ignored, as you focus instead on the links between
> the entities they contain.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20160719/b59b03be/attachment.html>


More information about the Distutils-SIG mailing list