Re: [Distutils] Outdated packages on pypi

On Jul 14, 2016, at 8:19 PM, Steve Dower steve.dower@python.org wrote:
I'm still keen to find a way to redirect people to useful forks or alternative packages that doesn't require thousands of mentions at conferences for all time ala PIL.
I’m not opposed to this but we’ll want to make sure we’re careful about how we do it. PIL is an easy example where the maintainer is gone and there is a community fork of it. But I struggle to come up with very many more examples of this where there is something that is:
- Popular enough that enough people are tripping over it to make it worth it. - There is a clear successor (or successors).
Off the top of my head I can only really think of PIL, and *maybe* suds. Unless there’s a lot of these maybe all we really need is a policy for when administrators can/will edit the page to direct people towards a different project or a way to add an admin message directing people to another project.
— Donald Stufft

I forget the exact names but there's a range of SQL Server packages that also fit in here. Perhaps I get to hear more complaints about those because of where I work :)
But you're right, it may be a small enough problem to handle it that way.
Top-posted from my Windows Phone
-----Original Message----- From: "Donald Stufft" donald@stufft.io Sent: 7/14/2016 17:25 To: "Steve Dower" steve.dower@python.org Cc: "Daniel D. Beck" daniel@ddbeck.com; "distutils-sig" distutils-sig@python.org Subject: Re: [Distutils] Outdated packages on pypi
On Jul 14, 2016, at 8:19 PM, Steve Dower steve.dower@python.org wrote:
I'm still keen to find a way to redirect people to useful forks or alternative packages that doesn't require thousands of mentions at conferences for all time ala PIL.
I’m not opposed to this but we’ll want to make sure we’re careful about how we do it. PIL is an easy example where the maintainer is gone and there is a community fork of it. But I struggle to come up with very many more examples of this where there is something that is:
- Popular enough that enough people are tripping over it to make it worth it. - There is a clear successor (or successors).
Off the top of my head I can only really think of PIL, and *maybe* suds. Unless there’s a lot of these maybe all we really need is a policy for when administrators can/will edit the page to direct people towards a different project or a way to add an admin message directing people to another project.
— Donald Stufft

On Fri, Jul 15, 2016, at 01:25 AM, Donald Stufft wrote:
Off the top of my head I can only really think of PIL, and *maybe* suds. Unless there’s a lot of these maybe all we really need is a policy for when administrators can/will edit the page to direct people towards a different project or a way to add an admin message directing people to another project.
Proposal: let's put some such manual intervention policy in place for now. Apply it for PIL to point to Pillow, and query the active suds forks to see if there's a generally agreed successor.
If this works well, great! If the admins are flooded with 'successor requests', then we can come back to question of an automated mechanism. If there are too many abandoned packages with competing successors, that's a trickier problem to solve, but at least we'd be considering it with more information.
As further examples: pydot, pexpect and python-modernize have all been unmaintained, leading to forks springing up. In all three cases, some of the forkers eventually coordinated to contact the original maintainer, get upload rights, and make new releases with the original name. It would certainly have been nice if that could have happened sooner in each case, but I doubt that any technical fix would have made a big difference.
Thomas

On 15 July 2016 at 23:59, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Fri, Jul 15, 2016, at 01:25 AM, Donald Stufft wrote:
Off the top of my head I can only really think of PIL, and *maybe* suds. Unless there’s a lot of these maybe all we really need is a policy for when administrators can/will edit the page to direct people towards a different project or a way to add an admin message directing people to another project.
Proposal: let's put some such manual intervention policy in place for now. Apply it for PIL to point to Pillow, and query the active suds forks to see if there's a generally agreed successor.
If this works well, great! If the admins are flooded with 'successor requests', then we can come back to question of an automated mechanism. If there are too many abandoned packages with competing successors, that's a trickier problem to solve, but at least we'd be considering it with more information.
+1, although I'd propose decoupling the policy aspect ("Project X has been declared unmaintained, with Project Y as its official successor") from the implementation aspect of how that policy is applied in PyPI and pip. That way we wouldn't be adding to the existing workload of the PyPI admins - their involvement would just be implementing the collective policy decisions of the PyPA, rather than being directly tasked with making those policy decisions themselves.
For example, suppose we had a "Replacement Packages" page on packaging.python.org, that documented cases like PIL -> Pillow, where:
- a de facto community standard package has become unmaintained - attempts to reach the existing maintainers to transfer ownership have failed - a de facto replacement package has emerged - the overall newcomer experience for the Python ecosystem is being harmed by legacy documentation that still recommends the old de facto standard
Adding new entries to that page would then require filing an issue at https://github.com/pypa/python-packaging-user-guide/issues/ to establish:
- the package being replaced is a de facto community standard - the package being replaced is important to the user experience of newcomers to the Python ecosystem - the package being replaced has become unmaintained - a newer community fork has gained sufficient traction to be deemed a de facto successor - the existing maintainer has been contacted, and is unresponsive to requests to accept help with maintenance
If *all* of those points are credibly established, *then* the package replacement would be added to the "Replacement Packages" list on packaging.python.org.
How that list was utilised in PyPI and pip, as well as in other package introspection tools (e.g. IDEs, VersionEye), would then be the decision of the designers of those tools.
As further examples: pydot, pexpect and python-modernize have all been unmaintained, leading to forks springing up. In all three cases, some of the forkers eventually coordinated to contact the original maintainer, get upload rights, and make new releases with the original name. It would certainly have been nice if that could have happened sooner in each case, but I doubt that any technical fix would have made a big difference.
The PyCA folks obtaining maintenance access to PyOpenSSL would be another example of this being navigated successfully without a long term split.
One of the longest running eventually resolved examples to date would be the multi-year setuptools/distribute split, and I'd actually consider that the ideal outcome of this process in general: while we understand entirely that folks may need to step away from open source software maintenance for a wide range of reasons, we strongly prefer to see projects providing critical functionality handed over to a new set of maintainers that have earned the trust of either the original maintainer or the wider community rather than letting them languish indefinitely.
We can't mandate that any given project invest time in succession planning though, so having a system in place to designate successor projects at the ecosystem level when maintainers aren't able to resolve it at a project level makes sense.
Cheers, Nick.

On Jul 16, 2016 3:36 AM, "Nick Coghlan" ncoghlan@gmail.com wrote:
On 15 July 2016 at 23:59, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Fri, Jul 15, 2016, at 01:25 AM, Donald Stufft wrote:
Off the top of my head I can only really think of PIL, and *maybe* suds. Unless there’s a lot of these maybe all we really need is a policy for
when
administrators can/will edit the page to direct people towards a
different
project or a way to add an admin message directing people to another project.
Proposal: let's put some such manual intervention policy in place for
now.
Apply it for PIL to point to Pillow, and query the active suds forks to
see
if there's a generally agreed successor.
If this works well, great! If the admins are flooded with 'successor requests', then we can come back to question of an automated mechanism.
If
there are too many abandoned packages with competing successors, that's
a
trickier problem to solve, but at least we'd be considering it with more information.
+1, although I'd propose decoupling the policy aspect ("Project X has been declared unmaintained, with Project Y as its official successor") from the implementation aspect of how that policy is applied in PyPI and pip. That way we wouldn't be adding to the existing workload of the PyPI admins - their involvement would just be implementing the collective policy decisions of the PyPA, rather than being directly tasked with making those policy decisions themselves.
For example, suppose we had a "Replacement Packages" page on packaging.python.org, that documented cases like PIL -> Pillow, where:
- a de facto community standard package has become unmaintained
- attempts to reach the existing maintainers to transfer ownership have
failed
- a de facto replacement package has emerged
- the overall newcomer experience for the Python ecosystem is being
harmed by legacy documentation that still recommends the old de facto standard
Adding new entries to that page would then require filing an issue at https://github.com/pypa/python-packaging-user-guide/issues/ to establish:
- the package being replaced is a de facto community standard
- the package being replaced is important to the user experience of
newcomers to the Python ecosystem
- the package being replaced has become unmaintained
- a newer community fork has gained sufficient traction to be deemed a
de facto successor
- the existing maintainer has been contacted, and is unresponsive to
requests to accept help with maintenance
If *all* of those points are credibly established, *then* the package replacement would be added to the "Replacement Packages" list on packaging.python.org.
How that list was utilised in PyPI and pip, as well as in other package introspection tools (e.g. IDEs, VersionEye), would then be the decision of the designers of those tools.
So, there could be RDFa in the project detail pages and a JSONLD key/dict in the project metadata indicating this community-reviewed edge or edges.
As an unreified triple: (pypi:PIL pypa:recommendsPackageInstead pypi:pillow)
As a reified edge: _:1234 a pypa:SupersededByEdge; schema:dateCreated iso8601; schema:description "reason"; schema:url https://github.com/pypa/python-packaging-user-guide/issues /1234; pypa:origPackage pypi:PIL; pypa:otherPackage pypi:pillow; .
These are N3/Turtle syntax; which is expressable as a JSONLD block in HTML and/or as RDFa HTML. (#PEP426JSONLD)
As further examples: pydot, pexpect and python-modernize have all been unmaintained, leading to forks springing up. In all three cases, some
of the
forkers eventually coordinated to contact the original maintainer, get upload rights, and make new releases with the original name. It would certainly have been nice if that could have happened sooner in each
case,
but I doubt that any technical fix would have made a big difference.
The PyCA folks obtaining maintenance access to PyOpenSSL would be another example of this being navigated successfully without a long term split.
One of the longest running eventually resolved examples to date would be the multi-year setuptools/distribute split, and I'd actually consider that the ideal outcome of this process in general: while we understand entirely that folks may need to step away from open source software maintenance for a wide range of reasons, we strongly prefer to see projects providing critical functionality handed over to a new set of maintainers that have earned the trust of either the original maintainer or the wider community rather than letting them languish indefinitely.
We can't mandate that any given project invest time in succession planning though, so having a system in place to designate successor projects at the ecosystem level when maintainers aren't able to resolve it at a project level makes sense.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 16 July 2016 at 23:47, Wes Turner wes.turner@gmail.com wrote:
So, there could be RDFa in the project detail pages and a JSONLD key/dict in the project metadata indicating this community-reviewed edge or edges.
Wes, once again, please stop attempting to inject JSON-LD into every metadata discussion we have on this list. We already know you like it.
However, despite being simpler than RDFa, JSON-LD is still overengineered for our purposes, so we're not going to ask people to go read the JSON-LD spec in order to understand any particular aspect of our APIs or metadata formats.
If you'd like to set up a PyPI dataset in one of the semantic web projects, then please feel free to do so, but adding that kind of information to the upstream metadata isn't a priority, any more than it's a priority to add native support for things like CPE [1] or SWID [2].
Regards, Nick.
[1] https://scap.nist.gov/specifications/cpe/ [2] http://tagvault.org/swid-tags/what-are-swid-tags/

If you have an alternate way to represent a graph with JSON, which is indexable as as RDF named graph quads and cryptographically signable irrespective of data ordering or representation format (RDFa, JSONLD) with ld-signatures, I'd be interested to hear how said format solves for that problem.
These contain checksums.
https://web-payments.org/specs/source/ld-signatures/
RDFa (with http://schema.org/SoftwareApplication and pypa:-specific classes and properties (types and attributes) for things like SupersededBy)) would be advantageous because, then, public and local RDF(a) search engines could easily assist with structured data search.
Adding data-request-pythonver *is* a stopgap solution which, initially, seems less bandwidth-burdensome; but requires all downstream data consumers to re-implement and ad-hoc parser; which is unnecessary if it's already understood that a graph description semantics which works in JSON (as JSONLD) and HTML has already solved that problem many times over.
On Jul 17, 2016 9:06 AM, "Nick Coghlan" ncoghlan@gmail.com wrote:
On 16 July 2016 at 23:47, Wes Turner wes.turner@gmail.com wrote:
So, there could be RDFa in the project detail pages and a JSONLD
key/dict in
the project metadata indicating this community-reviewed edge or edges.
Wes, once again, please stop attempting to inject JSON-LD into every metadata discussion we have on this list. We already know you like it.
However, despite being simpler than RDFa, JSON-LD is still overengineered for our purposes, so we're not going to ask people to go read the JSON-LD spec in order to understand any particular aspect of our APIs or metadata formats.
If you'd like to set up a PyPI dataset in one of the semantic web projects, then please feel free to do so, but adding that kind of information to the upstream metadata isn't a priority, any more than it's a priority to add native support for things like CPE [1] or SWID [2].
Regards, Nick.
[1] https://scap.nist.gov/specifications/cpe/ [2] http://tagvault.org/swid-tags/what-are-swid-tags/
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 18 July 2016 at 02:56, Wes Turner wes.turner@gmail.com wrote:
If you have an alternate way to represent a graph with JSON, which is indexable as as RDF named graph quads and cryptographically signable irrespective of data ordering or representation format (RDFa, JSONLD) with ld-signatures, I'd be interested to hear how said format solves for that problem.
It doesn't, but someone *that isn't PyPI* can still grab the data set, throw it into a graph database like Neo4j, calculate the cross references, and then republish the result as a publicly available data set for the semantic web. That way, the semantic linking won't need to be limited just to the Python ecosystem, it will be able to span ecosystems, as happens with cases like npm build dependencies (where node-gyp is the de facto C extension build toolchain for Node.js, and that's written in Python, so NPM dependency analysis needs to be able to cross the gap into the Python packaging world) and with frontend asset pipelines in Python (where applications often want to bring in additional JavaScript dependencies via npm rather than vendoring them).
Given that we already have services like libraries.io and release-monitoring.org for ecosystem independent tracking of upstream releases, they're more appropriate projects to target for the addition of semantic linking support to project metadata, as having one or two public semantic linking projects like that for the entirety of the open source ecosystem would make a lot more sense than each language community creating their own independent solutions that would still need to be stitched together later.
Cheers, Nick.

On Jul 19, 2016 2:37 AM, "Nick Coghlan" ncoghlan@gmail.com wrote:
On 18 July 2016 at 02:56, Wes Turner wes.turner@gmail.com wrote:
If you have an alternate way to represent a graph with JSON, which is indexable as as RDF named graph quads and cryptographically signable irrespective of data ordering or representation format (RDFa, JSONLD)
with
ld-signatures, I'd be interested to hear how said format solves for that problem.
It doesn't, but someone *that isn't PyPI* can still grab the data set, throw it into a graph database like Neo4j, calculate the cross references, and then republish the result as a publicly available data set for the semantic web. That way, the semantic linking won't need to be limited just to the Python ecosystem, it will be able to span ecosystems, as happens with cases like npm build dependencies (where node-gyp is the de facto C extension build toolchain for Node.js, and that's written in Python, so NPM dependency analysis needs to be able to cross the gap into the Python packaging world) and with frontend asset pipelines in Python (where applications often want to bring in additional JavaScript dependencies via npm rather than vendoring them).
Given that we already have services like libraries.io and release-monitoring.org for ecosystem independent tracking of upstream releases, they're more appropriate projects to target for the addition of semantic linking support to project metadata, as having one or two public semantic linking projects like that for the entirety of the open source ecosystem would make a lot more sense than each language community creating their own independent solutions that would still need to be stitched together later.
so, language/packaging-specific subclasses of e.g http://schema.org/SoftwareApplication and native linked data would reduce the need for post-hoc parsing and batch-processing.
there are many benefits to being able to JOIN on URIs and version strings here.
I'll stop now because OT; the relevant concern here was/is that, if there are PyPI-maintainer redirects to other packages, that metadata should probably be signed (and might as well be JSONLD, because this is a graph of packages and metadata). And there should be a disclaimer regarding auto-following said redirects.
Also, --find-links makes it dangerous to include comments with links.
#PEP426JSONLD
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 19 July 2016 at 17:25, Wes Turner wes.turner@gmail.com wrote:
On Jul 19, 2016 2:37 AM, "Nick Coghlan" ncoghlan@gmail.com wrote:
Given that we already have services like libraries.io and release-monitoring.org for ecosystem independent tracking of upstream releases, they're more appropriate projects to target for the addition of semantic linking support to project metadata, as having one or two public semantic linking projects like that for the entirety of the open source ecosystem would make a lot more sense than each language community creating their own independent solutions that would still need to be stitched together later.
so, language/packaging-specific subclasses of e.g http://schema.org/SoftwareApplication and native linked data would reduce the need for post-hoc parsing and batch-processing.
Anyone sufficiently interested in the large scale open source dependency management problem to fund work on it is going to want a language independent solution, rather than a language specific one. Folks only care about unambiguous software identification systems like CPE and SWID when managing large infrastructure installations, and any system of infrastructure that large is going to be old enough and sprawling enough to include multiple language stacks.
At the same time, nobody cares about this kind of problem when all they want to do is publish their hobby project or experimental proof of concept somewhere that their friends and peers can easily get to it, which means it doesn't make sense to expect all software publishers to provide the information themselves, and as a language ecosystem with a strong focus on inclusive education, we *certainly* don't want to make it a barrier to engagement with Python's default publishing toolchain.
there are many benefits to being able to JOIN on URIs and version strings here.
I'll stop now because OT; the relevant concern here was/is that, if there are PyPI-maintainer redirects to other packages, that metadata should probably be signed
Metadata signing support is a different problem, and one we want to pursue for a range of reasons.
(and might as well be JSONLD, because this is a graph of packages and metadata)
There is no "might as well" here. At the language level, there's a relevant analogy with Guido's work on gradual typing - talk to someone for whom a 20 person team is small, and a 10k line project is barely worth mentioning and their reaction is going to be "of course you want to support static type analysis", while someone that thinks a 5 person team is unthinkably large and a 1k line utility is terribly bloated isn't going to see any value in it whatsoever.
In the context of packaging metadata, supporting JSON-LD and RDFa is akin to providing PEP 434 type information for Python APIs - are they potentially useful? Absolutely. Are there going to be folks that see the value in them, and invest the time in designing a way to use them to describe Python packages? Absolutely (and depending on how a few other things work out, one of them may even eventually be me in a release-monitoring.org context).
But it doesn't follow that it then makes sense to make them a *dependency* of our interoperability specifications, rather than an optional add-on - we want folks just doing relatively simple things (like writing web services in Python) to be able to remain blissfully unaware that there's a world of large scale open source software supply chain management out there that benefits from having ways of doing things that are standardised across language ecosystems.
Regards, Nick.

so, there's a need for specifying the {PyPI} package URI in setup.py
and then generating meta.jsonld from setup.py
and then generating JSONLD in a warehouse/pypa view; because that's where they keep the actual metadara (package platform versions, checksums, potentially supersededBy redirects)
and then a signing key for a) package maintainer-supplied metadata and b) package repository metadata (which is/would be redundant but comforting)
and then third-party services like NVD, CVEdetails, and stack metadata aggregation services
- "PEP 426: Define a JSON-LD context as part of the proposal" https://github.com/pypa/interoperability-peps/issues/31 - "Expressing dependencies (between data, software, content...)" https://github.com/schemaorg/schemaorg/issues/975
sorry to hijack the thread; i hear "more links and metadata in an auxilliary schema" and think 'RDF is the semantic web solution for this graph problem'
On Jul 19, 2016 3:59 AM, "Nick Coghlan" ncoghlan@gmail.com wrote:
On 19 July 2016 at 17:25, Wes Turner wes.turner@gmail.com wrote:
On Jul 19, 2016 2:37 AM, "Nick Coghlan" ncoghlan@gmail.com wrote:
Given that we already have services like libraries.io and release-monitoring.org for ecosystem independent tracking of upstream releases, they're more appropriate projects to target for the addition of semantic linking support to project metadata, as having one or two public semantic linking projects like that for the entirety of the open source ecosystem would make a lot more sense than each language community creating their own independent solutions that would still need to be stitched together later.
so, language/packaging-specific subclasses of e.g http://schema.org/SoftwareApplication and native linked data would
reduce
the need for post-hoc parsing and batch-processing.
Anyone sufficiently interested in the large scale open source dependency management problem to fund work on it is going to want a language independent solution, rather than a language specific one. Folks only care about unambiguous software identification systems like CPE and SWID when managing large infrastructure installations, and any system of infrastructure that large is going to be old enough and sprawling enough to include multiple language stacks.
At the same time, nobody cares about this kind of problem when all they want to do is publish their hobby project or experimental proof of concept somewhere that their friends and peers can easily get to it, which means it doesn't make sense to expect all software publishers to provide the information themselves, and as a language ecosystem with a strong focus on inclusive education, we *certainly* don't want to make it a barrier to engagement with Python's default publishing toolchain.
there are many benefits to being able to JOIN on URIs and version strings here.
I'll stop now because OT; the relevant concern here was/is that, if
there
are PyPI-maintainer redirects to other packages, that metadata should probably be signed
Metadata signing support is a different problem, and one we want to pursue for a range of reasons.
(and might as well be JSONLD, because this is a graph of packages and metadata)
There is no "might as well" here. At the language level, there's a relevant analogy with Guido's work on gradual typing - talk to someone for whom a 20 person team is small, and a 10k line project is barely worth mentioning and their reaction is going to be "of course you want to support static type analysis", while someone that thinks a 5 person team is unthinkably large and a 1k line utility is terribly bloated isn't going to see any value in it whatsoever.
In the context of packaging metadata, supporting JSON-LD and RDFa is akin to providing PEP 434 type information for Python APIs - are they potentially useful? Absolutely. Are there going to be folks that see the value in them, and invest the time in designing a way to use them to describe Python packages? Absolutely (and depending on how a few other things work out, one of them may even eventually be me in a release-monitoring.org context).
But it doesn't follow that it then makes sense to make them a *dependency* of our interoperability specifications, rather than an optional add-on - we want folks just doing relatively simple things (like writing web services in Python) to be able to remain blissfully unaware that there's a world of large scale open source software supply chain management out there that benefits from having ways of doing things that are standardised across language ecosystems.
Regards, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 19 July 2016 at 18:13, Wes Turner wes.turner@gmail.com wrote:
so, there's a need for specifying the {PyPI} package URI in setup.py
Not really - tools can make a reasonable guess about the source PyPI URL based purely on the name and version. For non-PyPI hosted packages, the extra piece of info needed is the index server URL.
and then generating meta.jsonld from setup.py
No, a JSON-LD generator would start with a rendered metadata format, not the raw setup.py.
and then generating JSONLD in a warehouse/pypa view; because that's where they keep the actual metadara (package platform versions, checksums, potentially supersededBy redirects)
No, there is no requirement for this to be a PyPI feature. Absolutely none.
and then a signing key for a) package maintainer-supplied metadata and b) package repository metadata (which is/would be redundant but comforting)
This is already covered (thoroughly) in PEPs 458 and 480, and has nothing to do with metadata linking.
and then third-party services like NVD, CVEdetails, and stack metadata aggregation services
And this is the other reason why it doesn't make sense to do this on PyPI itself - the publisher provided metadata from PyPI is only one piece of the project metadata puzzle (issue trackers and source code repositories are another one, as are the communication metrics collected by the likes of Bitergia).
For a data aggregator, supporting multiple language ecosystems, and multiple issue trackers, and multiple code hosting sites is an M+N+O scale problem (where M is the number of language ecosystems supported, etc). By contrast, if you try to solve this problem in the package publication services for each individual language, you turn it into an M*(N+O) scale problem, where you need to give each language-specific service the ability to collect metadata from all those other sources.
This means that since we don't have a vested interest in adding more functionality to PyPI that doesn't specifically *need* to be there (and in fact actively want to avoid doing so), we can say "Conformance to semantic web standards is a problem for aggregation services like libraries.io and release-monitoring.org to solve, not for us to incorporate directly into PyPI".
sorry to hijack the thread; i hear "more links and metadata in an auxilliary schema" and think 'RDF is the semantic web solution for this graph problem'
I know, and you're not wrong about that. Where you're running into trouble is that you're trying to insist that it is the responsibility of the initial data *publishers* to conform to the semantic web standards, and it *isn't* - that job is one for the data aggregators that have an interest in making it easier for people to work across multiple data sets managed by different groups of people.
For publication platforms managing a single subgraph, native support for JSON-LD and RDFa introduces unwanted complexity by expanding the data model to incorporate all of the relational concepts defined in those standards. Well funded platforms may have the development capacity to spare to spend time on such activities, but PyPI isn't such a platform.
By contrast, for aggregators managing a graph-of-graphs problem, JSON-LD and RDFa introduce normalisation across data sets that *reduces* overall complexity, since most of the details of the subgraphs can be ignored, as you focus instead on the links between the entities they contain.
Cheers, Nick.

On Jul 19, 2016 8:44 AM, "Nick Coghlan" ncoghlan@gmail.com wrote:
On 19 July 2016 at 18:13, Wes Turner wes.turner@gmail.com wrote:
so, there's a need for specifying the {PyPI} package URI in setup.py
Not really - tools can make a reasonable guess about the source PyPI URL based purely on the name and version. For non-PyPI hosted packages, the extra piece of info needed is the index server URL.
So, the index server URL is in pip.conf or .pydistutils.cfg or setup.cfg OR specified on the commandline?
and then generating meta.jsonld from setup.py
No, a JSON-LD generator would start with a rendered metadata format, not the raw setup.py.
"pydist.json", my mistake
https://github.com/pypa/interoperability-peps/issues/31#issuecomment-1396572... - pydist.json - metadata.json (wheel)
- pydist.jsonld
and then generating JSONLD in a warehouse/pypa view; because that's
where
they keep the actual metadara (package platform versions, checksums, potentially supersededBy redirects)
No, there is no requirement for this to be a PyPI feature. Absolutely
none.
and then a signing key for a) package maintainer-supplied metadata and
b)
package repository metadata (which is/would be redundant but comforting)
This is already covered (thoroughly) in PEPs 458 and 480, and has nothing to do with metadata linking.
ld-signatures can be used to sign {RDF, JSONLD, RDFa}; and attach the signature to the document.
https://web-payments.org/specs/source/ld-signatures/
- JWS only works with JSON formats (and not RDF)
https://www.python.org/dev/peps/pep-0480/
- Does this yet include signing potentially cached JSON metadata used by actual tools like e.g. pip? - How do you feel about redirects because superseded and nobody can convince the maintainer to update the long_description?
and then third-party services like NVD, CVEdetails, and stack metadata aggregation services
And this is the other reason why it doesn't make sense to do this on PyPI itself - the publisher provided metadata from PyPI is only one piece of the project metadata puzzle (issue trackers and source code repositories are another one, as are the communication metrics collected by the likes of Bitergia).
AFAIU, the extra load of fielding vulnerability reports for responsibly PyPI-hosted packages is beyond the scope of the PyPI and Warehouse packages.
For a data aggregator, supporting multiple language ecosystems, and multiple issue trackers, and multiple code hosting sites is an M+N+O scale problem (where M is the number of language ecosystems supported, etc). By contrast, if you try to solve this problem in the package publication services for each individual language, you turn it into an M*(N+O) scale problem, where you need to give each language-specific service the ability to collect metadata from all those other sources.
Are you saying that, for release-monitoring.org (a service you are somehow financially associated with), you have already invested the time to read the existing PyPI metadata; but not eg the 'python' or 'python-dev' OS package metadata?
Debian has an RDF endpoint. - https://packages.qa.debian.org/p/python-defaults.html - https://packages.qa.debian.org/p/python-defaults.ttl - But there's yet no easy way to JOIN metadata down the graph of downstream OS packages to PyPI archives to source repository changesets; not without RDF and not without writing unnecessary language/packaging-community-specific {INI,JSON,TOML, YAMLLD } parsers.
O-estimations aside, when a data publisher publishes web standard data, everyone can benefit; because upper bound network effects N**2 (Metcalf's Law)
This means that since we don't have a vested interest in adding more functionality to PyPI that doesn't specifically *need* to be there (and in fact actively want to avoid doing so), we can say "Conformance to semantic web standards is a problem for aggregation services like libraries.io and release-monitoring.org to solve, not for us to incorporate directly into PyPI".
A view producing JSONLD.
Probably right about here: https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py
Because there are a few (possibly backwards compatible) changes that could be made here so that we could just add @context to the existing JSON record (thus making it JSONLD, which anyone can read and index without a domain-specific parser): https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py
IIRC: https://github.com/pypa/interoperability-peps/issues/31#issuecomment-2331955...
sorry to hijack the thread; i hear "more links and metadata in an auxilliary schema" and think 'RDF is the semantic web solution for this graph problem'
I know, and you're not wrong about that. Where you're running into trouble is that you're trying to insist that it is the responsibility of the initial data *publishers* to conform to the semantic web standards, and it *isn't* - that job is one for the data aggregators that have an interest in making it easier for people to work across multiple data sets managed by different groups of people.
No, after-the-fact transformation is wasteful and late.
A bit of advice for data publishers: http://5stardata.info/en/
For publication platforms managing a single subgraph, native support for JSON-LD and RDFa introduces unwanted complexity by expanding the data model to incorporate all of the relational concepts defined in those standards. Well funded platforms may have the development capacity to spare to spend time on such activities, but PyPI isn't such a platform.
This is Warehouse: https://github.com/pypa/warehouse
It is maintainable.
https://www.pypa.io/en/latest/help/
By contrast, for aggregators managing a graph-of-graphs problem, JSON-LD and RDFa introduce normalisation across data sets that *reduces* overall complexity, since most of the details of the subgraphs can be ignored, as you focus instead on the links between the entities they contain.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 20 July 2016 at 01:41, Wes Turner wes.turner@gmail.com wrote:
A view producing JSONLD.
Probably right about here: https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py
Then stop trying to guilt other people into implementing JSON-LD support for you, and submit a patch to implement it yourself.
Requirements:
- zero additional learning overhead for newcomers to Python packaging - near-zero additional maintenance overhead for tooling maintainers that don't care about the semantic web
If you can meet those requirements, then your rationale of "package dependencies are a linked graph represented as JSON, so we might as well support expressing them as JSON-LD" applies. Your best bet for that would likely be to make it an optional Warehouse feature (e.g. an alternate endpoint that adds the JSON-LD metadata), rather than a formal part of the interoperability specifications.
If you find you can't make it unobtrusive and optional, then you'd be proving my point that introducing JSON-LD adds further cognitive overhead to an already complicated system for zero practical gain to the vast majority of users of that system.
Regards, Nick.

I think you're right that we should identify the stakeholders here.
Which clients consume PyPI JSON?
@dstufft Is there a User Agent report for the PyPI and the warehouse legacy JSON views?
... https://code.activestate.com/lists/python-distutils-sig/25457/
Are there still pending metadata PEPs that would also need to be JSONLD-ified?
On Jul 19, 2016 10:45 PM, "Nick Coghlan" ncoghlan@gmail.com wrote:
On 20 July 2016 at 01:41, Wes Turner wes.turner@gmail.com wrote:
A view producing JSONLD.
Probably right about here:
https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py
Then stop trying to guilt other people into implementing JSON-LD support for you, and submit a patch to implement it yourself.
Requirements:
- zero additional learning overhead for newcomers to Python packaging
Should be transparent to the average bear.
- near-zero additional maintenance overhead for tooling maintainers
that don't care about the semantic web
Is it of value to link CVE reports with the package metadata?
If you can meet those requirements, then your rationale of "package dependencies are a linked graph represented as JSON, so we might as well support expressing them as JSON-LD" applies. Your best bet for that would likely be to make it an optional Warehouse feature (e.g. an alternate endpoint that adds the JSON-LD metadata), rather than a formal part of the interoperability specifications.
- Another cached view
If you find you can't make it unobtrusive and optional, then you'd be proving my point that introducing JSON-LD adds further cognitive overhead to an already complicated system for zero practical gain to the vast majority of users of that system.
There are a number of additional value propositions and use cases here: https://github.com/pypa/interoperability-peps/issues/31
When I find the time
Regards, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 20 July 2016 at 14:13, Wes Turner wes.turner@gmail.com wrote:
- near-zero additional maintenance overhead for tooling maintainers
that don't care about the semantic web
Is it of value to link CVE reports with the package metadata?
On PyPI, the main value would be in publisher notification (i.e. if folks maintaining projects on PyPI aren't tracking CVE reports directly, it would be nice if they could opt in to having PyPI do it for them rather than having to learn how to navigate the CVE ecosystem themselves - "Maintainers are actively monitoring CVE notifications" would then become a piece of metadata PyPI could potentially publish to help distinguish people's personal side projects from projects with funded developers supporting them). Similarly, given suitable investment in Warehouse development, PyPI could be enhanced to provide a front-end to the experimental Distributed Weakness Filing system, where folks can request assignment of CVE numbers in a more automated way than the traditional process.
However, for clients, the problem with relying on PyPI for CVE notifications is that what you actually want as a developer is a situation where your security notifications are independent of the particular ecosystem providing the components, and also where a compromise of your connection to the software publication platform doesn't inhibit your ability to be alerted to security concerns.
While there *are* ecosystem specific services already operating in that domain (e.g. requires.io for Python), the cross-language ones like VersionEye.com and dependencyci.com are more valuable when you're running complex infrastructure, since they abstract away the ecosystem-specific differences. While the previously mentioned libraries.io is the release notification component for dependencyci.com, release-monitoring.org is a service written in Python by the Fedora Infrastructure team to provide upstream release notifications to Linux distributions that's been around for a while longer, hence why that tends to be my own main point of interest.
Cheers, Nick.
P.S. For anyone that isn't aware, understanding and helping to manage the sustainability of Red Hat's software supply chain makes up a respectable portion of my day job. In the Python ecosystem, that just happens to include pointing out things that volunteers probably shouldn't invest their own time in implementing, since they're good candidates for funded commercial development ;)
participants (5)
-
Donald Stufft
-
Nick Coghlan
-
Steve Dower
-
Thomas Kluyver
-
Wes Turner