[Distutils] Maintaining a curated set of Python packages

Thu Dec 15 22:48:59 EST 2016

On 16 December 2016 at 05:50, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15 December 2016 at 19:13, Wes Turner <wes.turner at gmail.com> wrote:
>>> Just to add my POV, I also find your posts unhelpful, Wes. There's not
>>> enough information for me to evaluate what you say, and you offer no
>>> actual solutions to what's being discussed.
>>
>>
>> I could quote myself suggesting solutions in this thread, if you like?
>
> You offer lots of pointers to information. But that's different.

Exactly. There are *lots* of information processing standards out
there, and lots of things we *could* provide natively that simply
aren't worth the hassle since folks that care can provide them as
"after market addons" for the audiences that considers them relevant.

For example, a few things that can matter to different audiences are:

- SPDX (Software Package Data Exchange) identifiers for licenses
- CPE (Common Product Enumeration) and SWID (Software Identification)
tags for published software
- DOI (Digital Object Identifier) tags for citation purposes
- Common Criteria certification for software supply chains

I don't push for these upstream in distutils-sig not because I don't
think they're important in general, but because I *don't think they're
a priority for distutils-sig*. If you're teaching Python to school
students, or teaching engineers and scientists how to better analyse
their own data, or building a web service for yourself or your
employer, these kinds of things simply don't matter.

The end users that care about them are well-positioned to tackle them
on their own (or pay other organisations to do it for them), and
because they span arbitrary publishing communities anyway, it doesn't
really matter all that much if any given publishing community
participates directly in the process (the only real beneficiaries are
the intermediaries that actively blur the distinctions between the
cooperative communities and the recalcitrant ones).

> Anyway, let's just agree to differ - I can skip your mails if they
> aren't helpful to me, and you don't need to bother about the fact that
> you're not getting your points across to me.

I consider it fairly important that we have a reasonably common
understanding of the target userbase for direct consumption of PyPI
data, and what we expect to be supplied as third party services. It's
also important that we have a shared understanding of how to
constructively frame proposals for change.

For the former, the Semantic Web, and folks that care about Semantic
Web concepts like "Linked Data" in the abstract sense are not part of
our primary audience. We don't go out of our way to make their lives
difficult, but "it makes semantic analysis easier" also isn't a
compelling rationale for change.

For the latter, some variants of constructive proposals look like:

- "this kind of user has this kind of problem and this proposed
solution will help mitigate it this way (and, by the way, here's an
existing standard we can use)"
- "this feature exists in <third party tool or service>, it's really
valuable to users for <these reasons>, how about we offer it by
default?"
- "I wrote <thing> for myself, and I think it would also help others
for <these reasons>, can you help me make it more widely known and
available?"

They don't look like "Here's a bunch of technologies and organisations
that exist on the internet that may in some way potentially be
relevant to the management of a software distribution network", and
nor does it look like "This data modeling standard exists, so we
should use it, even though it doesn't actually simplify our lives or
our users' lives in any way, and in fact makes them more complicated".

> Who knows, one day I
> might find the time to look into JSON-LD, at which point I may or may
> not understand why you think it's such a useful tool for solving all
> these problems (in spite of the fact that no-one else seems to think
> the same...)

I *have* looked at JSON-LD (based primarily on Wes's original
suggestions), both from the perspective of the Python packaging
ecosystem specifically, as well as my day job working on software
supply chain management.

My verdict was that for managing a dependency graph implementation, it
ends up in the category of technologies that qualify as "interesting,
but not helpful". In many ways, it's the urllib2 of data linking -
just as urllib2 gives you a URL handling framework which you can
configure to handle HTTP rather than just providing a HTTP-specific
interface the way requests does [1], JSON-LD gives you a data linking
framework, which you can then use to define links between your data,
rather than just linking the data directly in a domain-appropriate
fashion. Using a framework for the sake of using a framework rather
than out of a genuine engineering need doesn't tend to lead to good
software systems.

Wes seems to think that my perspective on this is born out of
ignorance, so repeatedly bringing it up may make me change my point of
view. However, our problems haven't changed, and the nature and
purpose of JSON-LD haven't changed, so it really won't - the one thing
that will change my mind is demonstrated popularity and utility of a
service that integrates raw PyPI data with JSON-LD and schema.org.

Hence my suggestions (made with varying degrees of politeness) to go
build a dependency analysis service that extracts the dependency trees
from libraries.io, maps them to schema.org concepts in a way that
makes sense, and then demonstrate what that makes possible that can't
be done with the libraries.io data directly.

Neither Wes nor anyone else needs anyone's permission to go do that,
and it will be far more enjoyable for all concerned than the status
quo where Wes is refusing to take "No, we've already looked at
JSON-LD, and we believe it adds needless complexity for no benefit
that we care about" for an answer by continuing to post about it
*here* rather than either venting his frustrations about our
collective lack of interest somewhere else, or else channeling that
frustration into building the system he wishes existed.

Cheers,
Nick.

[1] http://www.curiousefficiency.org/posts/2016/08/what-problem-does-it-solve.html

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia