[Distutils] A possible refactor/streamlining of PEP 517

Paul Moore p.f.moore at gmail.com
Thu Jul 6 14:19:45 EDT 2017


On 6 July 2017 at 17:35, Thomas Kluyver <thomas at kluyver.me.uk> wrote:

> Of course, I also have a vested interest in things not working this way:
> I would get a steady trickle of people asking "why does flit require a
> VCS to install from source?" From my perspective, it doesn't require
> that, but I would be unable to 'fix' it.

That's a good point - and provides a good contrast to my perspective
as a pip developer that *pip* gets issues raised that aren't really
pip's problem. I think it's in everyone's best interests to ensure
that the user's experience is as unambiguous as possible in saying
where any given problem lies.

One thought occurs to me in that context - in my view, we should be
clearly presenting to the user that it's *pip's* role to do the
install, and flit's responsibility is to build wheels. I know that
flit includes an install command, but I view that as a temporary
workaround for the fact that PEP 517 isn't implemented yet. I'd be
interested to know if you agree with that. See below as to my view on
how the responsibility for "needing a VCS to install from source"
follows from that. But essentially, we're promoting "pip install
<whatever>" as the canonical install command, and "pip wheel
<whatever>" as the canonical "build a wheel" command - backend
specific commands should be for specialised use only, as I see it.

> My idealised view of the state machine is something like this:
>
> wheel <-- source tree <--> sdist

Personally I wouldn't have a major problem with this, although I don't
think Donald would agree, as there's questions that he's raised around
potential inconsistencies between sdists and wheels built direct from
the source tree that are unanswered in this model. My biggest concern,
though, is that if we take this view, then it's critical that we have
a reliable and efficient means of *copying* source trees.
Specifically:

1. By reliable, I mean that wheels built from the original and the
copy must be identical. And that if the original supports building a
sdist, then by implication wheels created via the source tree -> sdist
-> wheel route must be identical to both.
2. By efficient, I mean that copying the directory isn't sufficient,
because we already know that has unacceptable overheads in the
presence of VCS data and things like .tox directories.

The question of build isolation definitely requires a means to copy a
source tree, but I don't want to get tied up with that debate here - I
simply think that *not* being able to copy a source tree is going to
be a problem at some point, and we should design the interface to
avoid that problem. All the business over the "prepare files for
sdist" hooks, and the "create sdist and unpack it" approaches is
basically trying to address the question of how we duplicate an
arbitrary source tree.

With this arrangement, it's clearly pip's responsibility to do an
install from whatever source the user provides. The only requirement
on backends like flit is that we have a way to copy source trees, and
I don't think you have an issue with that. The copy is only required
to be sufficient to build a wheel, not a sdist. (At least for now, as
we don't currently promote a canonical command to build sdists).

Tox may have more stringent requirements - currently it requires the
ability to build a sdist to install from, and I'm inclined to think
that this is a deliberate design choice rather than merely a
convenience. I'm guessing that no-one has particularly explored the
question of how tox would interact with flit-based projects yet? Would
it be acceptable to say that tox only works on a full checkout with
VCS tools present (i.e, what flit needs to build a sdist) for
flit-based projects? I don't really know.

> I agree that there's a problem with losing important data when you go
> [source tree --> sdist --> source tree] - in fact this is one of the
> pain points I was trying to avoid with flit. But I don't like the idea
> of solving that by saying that all wheels must have passed through an
> sdist; it feels like a redundant there-and-back-again journey.
>
> So how else could we tackle the systematic problem? It's definitely a
> good idea to ensure that [stree --> sdist --> stree --> wheel] doesn't
> miss out anything that [stree --> wheel] includes, but I'd focus on
> doing this in developer tools, e.g.:
>
> 1. Tools such as flit could check it when you're building a release
> 2. Tools running on CI services could build both and compare them
> 3. Bots could scan PyPI for projects with both a .whl and a .tar.gz,
> build a wheel from the tarball, compare them, and notify the maintainer
> if there's a problem.

Ideally, I'd say that the best way of addressing this is not to
duplicate or discard information. But flit can make its own choices
here. There's some overlap with the PEP, in the sense that we need the
defined interface to not be actively hostile to frontends or other
tools that want to maintain some level of invariant in terms of it
doesn't matter what route is taken to produce the wheel, the result
will be the same. That's why I'm now focusing on ensuring we have some
means of enabling source tree copying.

Personally, I'm not a fan of after-the-fact checking like you describe
above. My specific concerns are (in reverse order of your points):

3. Reporting problems on PyPI is basically too late. There's already a
broken release published.
2. Tools on CI are OK, but we can't guarantee that projects would run
them - there's an education and publicity issue around making people
aware of the need.
1. Having backends check is not bad, but I'm concerned about mandating
a particular release process.

But I don't mind deferring the question of how we validate (after all,
we don't currently have any such tools) as long as it's understood
that backends shouldn't lose data needed to build *wheels* (I think we
can live with needing a specific setup - what I referred to as a
"publishing tree" previously - to build sdists).

Paul


More information about the Distutils-SIG mailing list