On Fri, 31 Aug 2018 at 01:41 Paul Moore <p.f.moore@gmail.com> wrote:
On Fri, 31 Aug 2018 at 07:15, Nathaniel Smith <njs@pobox.com> wrote:
>
> On Thu, Aug 30, 2018 at 6:52 PM, Brett Cannon <brett@python.org> wrote:
> >
> >
> > On Thu, 30 Aug 2018 at 11:21 Nathaniel Smith <njs@pobox.com> wrote:

> >> * Make the 3 tag categories totally independent. Compute a separate set
> >> for each, and then take the full cross product.
> >
> >
> > I think Paul was hinting at this as part of his "wildcard" idea (and I
> > honestly thought of this initially as well as it greatly simplifies things).
> > So what would cp36-cp36m-plat expand to?
>
> I don't know what it means to "expand" a wheel tag.

To me, the confusion really lies in the interaction of "what tags do
producers set" and "what tags to consumers accept".

Yes, it's the fact that some tags seem to have very strict requirements for compatibility while others are fairy lenient (e.g. cp36m versus abi3).
 
Nearly all of the
odd corner cases of tags on wheels that we see in the wild don't come
from bdist_wheel without manual project intervention, and there's no
validation on what they do in that case. So when we're trying to say
"what should we accept" we're faced with a dilemma, because we don't
really know what the options are.

As a starting point, I think it would be useful to focus on what tools
are allowed to *generate*.

The current situation seems to me to be as follows (mostly from the PEP):

1. Python tag - can be absolutely anything (--python-tag doesn't
appear to validate the value at all). In practice, py2, py3, and cpXY
are the most common values AFAIK. I'd consider most other values as
having had "limited testing" :-)

Let's find out how common. :)
Using the following query (and adjusting as appropriate):

```
SELECT
  COUNT(DISTINCT file.project) as projects
FROM
  [the-psf:pypi.downloads20180824]
WHERE
  file.filename LIKE '%-py3-%.whl'
```

You get the following results:
  • *: 120,079
  • cp36: 1,088
  • cp3: 2
  • py36: 79
  • py3: 15,639
So yes, there is a definite skew towards cp36 and py3 for at least Python 3.6. :) I also talked with Barry Warsaw about this and he mentioned how Debian shares pure Python installs across environments and while there was initial worry about e.g. py36 versus py35 wheels needing to be taken into account, the problem never really came up.
 
2. ABI tag - I don't know. But I'm pretty sure there are variations in
how this works between Windows and Unix, that should be considered
(IIRC, for C extensions older versions of Python on Windows produced
wheels with a "none" ABI tag, but newer ones produce "cpXYm" which was
a change to match Linux behaviour).

I believe this is correct. The other tricky bit is abi3 as that compatibility is tied to the python tag.

I think this is where Nathaniel's idea of moving the Python tag to just be py3 and then being tighter on the ABI tag comes in.
 
3. Platform tag - distutils.util.get_platform() (but what about
manylinux, how does that end up in there?)

If you mean how does 'wheel' decide to put it in there, I don't know. For checking if a wheel works it comes from a check of system details.
 
Is there a list of valid
values that get_platform() can produce?

Not that I'm specifically aware of. The tricky bit is really when a platform is compatible with more than a single platform tag like Linux and macOS are.
 

IMO, before worrying about what consumers should match, we should tie
down the valid values *producers* are allowed to generate, and from
that consider whether PyPI should enforce a particular set of allowed
tags. Obviously, projects can do what they like in terms of randomly
renaming wheels, and users can do "pip install any-old-junk.whl", but
we need a baseline of what counts as "sensible" tagging schemes if
we're going to enumerate acceptable tags in a consumer.

Essentially this is a variation on the principle of "be strict in what
you produce, and lenient in what you consume", focused on the
"produce" side of the equation.

My comment about wildcarding is basically a forlorn hope that if we
can't be sure what we're getting, maybe the best we can do is say
"well, py3-*-* sounds like it might work, and there's nothing better,
so let's give it a go". But if we can tie down a definition of what
constitutes a "reasonable" set of tags, enumerating what we'll accept
seems more likely to succeed.
 
OK, so let's look at what we're trying to support. If we have pure Python code there's very likely going to be a bottom Python version that's supported and then forward-compatibility is assumed. This is specified today through python-requires, so having a specific Python version in the wheel itself isn't totally critical. So 'py3-none-any' combined with python-requires takes care of this the vast majority of the time.

Then you have extension modules. At a CPython level there's something like cp36m today (and its variants which are not compatible with each other), and then there's abi3 whose compatibility is determined by the Python tag as well. That currently means you can't read the ABI tag in isolation but have to treat the Python-ABI tag pair as a single unit. For other interpreters like PyPy3 I don't think there's been any need or thought about this so it's just a single tag like 'pypy3_60'.

This is where Nathaniel is suggesting the Python tag just be 'py3' or 'py2' and more information gets embedded in the ABI tag, e.g. abi36 (which makes sense since the stable ABI should be forwards-compatible), cp36m, and then 'none' for pure Python.

As for platforms, that's a per-platform thing unfortunately. My Mac, for instance, is compatible with 'macosx_10_13_x86_64', 'macosx_10_13_intel', 'macosx_10_13_fat64', 'macosx_10_13_fat32', and 'macosx_10_13_universal' (according to pip). Linux will have the platform plus maybe manylinux1. I have not dived into what Windows 10 would emit. I'm not sure how people would want to tidy this up.

IOW I like Nathaniel's proposal:
  1. Python tag: 'py2', 'py3'
  2. ABI tag: abiXY (with assumed forwards-compatibility and thus the one special case to not doing a strict match compared to other tags), 'none', and then a wildcard acceptance for things like pypy3_60, cp36m, or whatever any other interpreter might need to specify
  3. Platform tag: I don't know of any specific shift for this and it's probably stuck being platform-specific
If we agree to this I can try to code a library that expects something like this to be the future but have backwards-compatibility for today with what I proposed while we transition.

What do people think about that?