[Distutils] A possible refactor/streamlining of PEP 517

Donald Stufft donald at stufft.io
Thu Jul 6 12:08:31 EDT 2017


> On Jul 6, 2017, at 11:36 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> 
> On 6 July 2017 at 15:54, Donald Stufft <donald at stufft.io> wrote:
>> The fundamental problem here is that sdists *are* a key part of the build
>> pipeline and are always going to be unless pip stops supporting sdists all
>> together. I think it is a complete non-starter to suggest removing
>> installation from sdist support from pip (particularly since it would
>> immediately lose support for every platform but Windows, MacOS and many
>> common Linux’s (but not all of them!).
> 
> I wonder how true this is. Certainly the route "acquire sdist ->
> unpack -> build wheel -> install" is a fundamental route, as is
> "acquire wheel -> install". But as Nick pointed out, the awkward cases
> are all in the *other* area, which is "get a random source tree -> ???
> -> install". That's where all the debate about isolated builds,
> incremental compiles, etc, occur. We've been focusing on the sdist as
> a means of copying trees, and maybe that makes it feel like sdists are
> more fundamental than maybe they need to be. The fundamental operation
> is really "copy this arbitrary source tree safely”.

By my saying that they are a key part, I mean we can’t ever (reasonably) stop supporting that route so our options for those “other” areas are to either try to push them onto that same route or to just say that multiple routes are a thing and we just need deal with that fact and support multiple routes.

My rationale for reusing sdists for “copy this arbitrary source tree safely” is if you have a separate hook for “make this sdist, which we will eventually build a wheel from” and “copy this arbitrary source tree, which we will eventually build a wheel from” then you are *going* to end up with variations where VCS -> Sdist -> Wheel -> Installed ends up with a different result than VCS -> Wheel -> Installed. The best we can hope for in that hypothetical is that the variations are minor enough that they don’t generally cause problems.

The most common case for problems in this area are going to come from disparity in the list of files that get added to a sdist and the list of files that get installed. We can see this today with setuptools/distutils and MANIFEST.in controlling a file being added to the sdist, but package, py_modules, package_data, etc controlling what gets installed. This is *NOT* however a unique problem to setuptools/ditutils, for instance it appears the same issue can occur with enscons if your list of files that you pass into env.Whl includes files that you accidentally left out of env.SDist. When I looked at flit it also suffered the same problem if you forgot to commit a file to the VCS repository (which meant it wouldn’t get added to the sdist) but would be included in any wheels created from that directory.

That’s not the only cases though, even if you get a build backend that is absolutely perfect about ensuring that a wheel created from a VCS directory will be close enough to a wheel created from a sdist that was created from a VCS directory you still have the fact that you can have extra files sitting in those directories that aren’t getting included. This would show up in cases like when you’re using a ``-e .`` install (which to be fair, this PEP doesn’t touch, but it’s still something to keep in mind) or even just the common case where you end up with trying to run your thing in a virtual environment but ``.`` is first on sys.path and you end up importing the copy that is sitting in your VCS (and thus has that extra file).

Can each backend strive to implement this correctly and solve this problem that way? Yes absolutely. However in reality, good intentions don’t work and these issues are going to crop up in each backend and have to get resolved in each backend. Maybe the number of backends will be so small that this isn’t that big of a deal. However using sdist here is a pragmatic, systematic solution that completely side steps the entire class of problems for most cases (sans -e . unfortunately).

> 
> I'm not sure I have a solution here, but as a starting point maybe we
> need to conceptually separate source trees into "publishing trees"
> (ones that the backend is capable of building a sdist from) and "build
> trees" (ones that the backend only supports building wheels from).
> Whether unpacking a sdist gives a publishing tree or a build tree is
> backend-defined (setuptools says yes, flit says no) but frontends need
> to deal with the distinction cleanly.
> 
> Isolated and incremental build questions are answered differently
> depending on whether you have a publishing or a build tree. And some
> of those questions prompt the need for copying trees (or at least
> creating equivalent build trees from whatever you have). For a
> publishing tree, "make sdist and unpack" works, but that's not
> possible for a build tree.

This is similar to a thing I said above I think, where I would be happy adding an official marker to the inside of a sdist (similar to the .dist-info/WHEEL file) that can be used to generically determine if something is an unpacked sdist or not. In this case if we ran build_sdist inside of an unpacked sdist and it returned a NotImplemented marker, then we could fall back to just copying the tree (or building in pace if that’s what a front end wanted to do). 

> 
> There's also the fact that tox uses sdists to populate its
> environments. But bluntly, that's tox's problem, not distutils-sig's.
> How tox handles flit-based projects is a different question, that we
> don't really have the relevant experts present here to answer. The
> same is true of any *other* potential consumers of PEP 517 backends
> such as hypothetical "unified sdist builders". I'm inclined to say
> that we shouldn't even try to consider these, but should limit PEP 517
> to the pip (or equivalent) <-> backend interface. Future PEPs can
> expand the interface as needed.

I mean, I think they are our problems too. The ecosystem is made better by the fact tox exists and considering our impact there is important. That doesn’t mean that we should bend over backwards to contort the PEP to fit tox but I also don’t think we should dismiss it out of hand as someone else’s problem.

> 
> I don't know if any of this helps. If not, that's fine (it at least
> helped me to clarify my thinking about source trees and sdists). But
> I'm posting it in case it prompts any new insights.
> 
> Paul


—
Donald Stufft



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170706/ad629f62/attachment.html>


More information about the Distutils-SIG mailing list