[Distutils] Idea: Using Trove classifiers for platform compatibility warnings

Sat Apr 8 05:29:24 EDT 2017

On 8 April 2017 at 03:17, Nick Coghlan <ncoghlan at gmail.com> wrote:
> PyPI already has a reasonably extensive component tagging system in
> https://pypi.python.org/pypi?%3Aaction=list_classifiers but we don't
> really *use* it all that much for programmatic purposes.
>
> That means the incentives for setting tags correctly are weak, since
> there isn't much pay-off in the form of tooling-intermediated
> communication of constraints and testing limitations to end users.
>
> What I'm starting to wonder is whether or not it may make sense to
> start recommending that installation tools emit warnings in the
> following cases:
>
> 1. At least one "Operating System" tag is set, but the tags don't
> include any that cover the *current* operating system
> 2. At least one "Programming Language :: Python" tag is set, but the
> tags don't include any that cover the *current* Python version
>
> The "at least one relevant tag is set" pre-requisite would be to avoid
> emitting false positives for projects that don't provide any platform
> compatibility guidance at all.

I agree that there's little incentive at the moment to get classifiers
right. So my concern with this proposal would be that it issues the
warnings to end users, who don't have any direct means of resolving
the issue (they can of course raise bugs on the projects they find
with incorrect classifiers). Furthermore, there's a potential risk
that projects might see classifiers as implying a level of support
they are not happy with, and so are reluctant to add classifiers
"just" to suppress the warning.

But without data, the above is just FUD, so I'd suggest we do some
analysis. I did some spot checks, and it seems that projects might
typically not set the OS classifier, which alleviates my biggest
concern (projects stating "POSIX" because that's what they develop on,
when they actually work fine on Windows) - but propoer data would be
better. Two things I'd like to see:

1. A breakdown of how many projects actually use the various OS and
Language classifiers.
2. Where projects ship wheels, do the wheels they ship match the
classifiers they declare?

That should give a good idea of the immediate impact of this proposal.
(There's not much we can say about source-only distributions, but
that's OK). The data needed to answer those questions should be
available - the only way I have of getting it is via the JSON
interface to PyPI, so I can write a script to collect the information,
but it might be some time before I can collate it. Is this something
the BigQuery data we have (which I haven't even looked at myself)
could answer?

Paul