so, there's a need for specifying the {PyPI} package URI in setup.py

and then generating meta.jsonld from setup.py

and then generating JSONLD in a warehouse/pypa view; because that's where they keep the actual metadara (package platform versions, checksums, potentially supersededBy redirects)

and then a signing key for a) package maintainer-supplied metadata and b) package repository metadata (which is/would be redundant but comforting)

and then third-party services like NVD, CVEdetails, and stack metadata aggregation services

- "PEP 426: Define a JSON-LD context as part of the proposal"
  https://github.com/pypa/interoperability-peps/issues/31
- "Expressing dependencies (between data, software, content...)"
  https://github.com/schemaorg/schemaorg/issues/975

sorry to hijack the thread;  i hear "more links and metadata in an auxilliary schema" and think 'RDF is the semantic web solution for this graph problem'


On Jul 19, 2016 3:59 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
On 19 July 2016 at 17:25, Wes Turner <wes.turner@gmail.com> wrote:
> On Jul 19, 2016 2:37 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
>> Given that we already have services like libraries.io and
>> release-monitoring.org for ecosystem independent tracking of upstream
>> releases, they're more appropriate projects to target for the addition
>> of semantic linking support to project metadata, as having one or two
>> public semantic linking projects like that for the entirety of the
>> open source ecosystem would make a lot more sense than each language
>> community creating their own independent solutions that would still
>> need to be stitched together later.
>
> so, language/packaging-specific subclasses of e.g
> http://schema.org/SoftwareApplication and native linked data would reduce
> the need for post-hoc parsing and batch-processing.

Anyone sufficiently interested in the large scale open source
dependency management problem to fund work on it is going to want a
language independent solution, rather than a language specific one.
Folks only care about unambiguous software identification systems like
CPE and SWID when managing large infrastructure installations, and any
system of infrastructure that large is going to be old enough and
sprawling enough to include multiple language stacks.

At the same time, nobody cares about this kind of problem when all
they want to do is publish their hobby project or experimental proof
of concept somewhere that their friends and peers can easily get to
it, which means it doesn't make sense to expect all software
publishers to provide the information themselves, and as a language
ecosystem with a strong focus on inclusive education, we *certainly*
don't want to make it a barrier to engagement with Python's default
publishing toolchain.

> there are many benefits to being able to JOIN on URIs and version strings
> here.
>
> I'll stop now because OT;  the relevant concern here was/is that, if there
> are PyPI-maintainer redirects to other packages, that metadata should
> probably be signed

Metadata signing support is a different problem, and one we want to
pursue for a range of reasons.

>  (and might as well be JSONLD, because this is a graph of
> packages and metadata)

There is no "might as well" here. At the language level, there's a
relevant analogy with Guido's work on gradual typing - talk to someone
for whom a 20 person team is small, and a 10k line project is barely
worth mentioning and their reaction is going to be "of course you want
to support static type analysis", while someone that thinks a 5 person
team is unthinkably large and a 1k line utility is terribly bloated
isn't going to see any value in it whatsoever.

In the context of packaging metadata, supporting JSON-LD and RDFa is
akin to providing PEP 434 type information for Python APIs - are they
potentially useful? Absolutely. Are there going to be folks that see
the value in them, and invest the time in designing a way to use them
to describe Python packages? Absolutely (and depending on how a few
other things work out, one of them may even eventually be me in a
release-monitoring.org context).

But it doesn't follow that it then makes sense to make them a
*dependency* of our interoperability specifications, rather than an
optional add-on - we want folks just doing relatively simple things
(like writing web services in Python) to be able to remain blissfully
unaware that there's a world of large scale open source software
supply chain management out there that benefits from having ways of
doing things that are standardised across language ecosystems.

Regards,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia