Provisionally accepting PEP 517's declarative build system interface

Hi folks,
I am hereby provisionally accepting PEP 517 as our CLI independent interface for build backends, allowing interested parties to move forward with implementing it as part of pip and other tools.
As with other provisionally accepted PyPA interoperability specifications, it won't be considered completely settled until it has been implemented and released in pip and we have at least one iteration of end user feedback.
Thanks to Nathaniel & Thomas for their work on the PEP, and to everyone that participated in the related discussions!
Regards, Nick.

On May 28, 2017, at 11:40 PM, Nick Coghlan ncoghlan@gmail.com wrote:
Hi folks,
I am hereby provisionally accepting PEP 517 as our CLI independent interface for build backends, allowing interested parties to move forward with implementing it as part of pip and other tools.
As with other provisionally accepted PyPA interoperability specifications, it won't be considered completely settled until it has been implemented and released in pip and we have at least one iteration of end user feedback.
Thanks to Nathaniel & Thomas for their work on the PEP, and to everyone that participated in the related discussions!
Bleh, I had it on my stack to respond to PEP 517, but life has been super hectic so I hadn’t gotten around to it.
Here are a few thoughts FWIW:
1. Using the {package-name}-{package-version}.dist-info in the get_wheel_metadata() metadata is a mistake I think. In pip currently we have a bug we have not yet been able to track down because there is nothing systematically preventing both foobar-1.0.dist-info and foobar-2.0.distinfo from existing side by side in a build directory (or inside a wheel for that matter). Thus I think this naming scheme is a nuisance and we shouldn’t propagate it any further. I would just use something like DIST-INFO/ which will completely side step this issue. The only reason I can think of to use the current scheme is to make it easier to shutil.copytree it into the wheel, but handling that case is trivial.
2. As I mentioned in previous discussions on this, I think that this interface *needs* some mechanism to ask it to build a sdist. Ideally this would either fail or recreate the sdist when being run from something that is already an sdist. In pip when we’re doing ``pip install .`` we currently copy tree the entire thing into a temporary directory, and we run the build out of there. For a variety of reasons we’re going to keep build isolation, but the current mechanism is painful because it also grabs things like .tox, .git, etc. At one point we tried to simply filter those out but it broke some packages expectations. The planned (which is on my long list of things to do…) mechanism to fix this is to create a sdist from a ``.`` and then install from that (building an intermediate wheel as well). This also solves another class of bugs that people run into where ``pip install .`` and ``python setup.py sdist && pip install dist/*`` give different results. As written, this PEP prevents that from happening (and thus, when I implement it, I’ll only be able to implement it for old style sdists, and will need to tell people to continue to use old style if they don’t want pip to grab random directories from ``.``).
Other than that, it looks fine, and #2 is the one that I think is going to be the bigger issue in pip.
— Donald Stufft

On Sun, May 28, 2017 at 10:37 PM, Donald Stufft donald@stufft.io wrote:
Bleh, I had it on my stack to respond to PEP 517, but life has been super hectic so I hadn’t gotten around to it.
Here are a few thoughts FWIW:
- Using the {package-name}-{package-version}.dist-info in the
get_wheel_metadata() metadata is a mistake I think. In pip currently we have a bug we have not yet been able to track down because there is nothing systematically preventing both foobar-1.0.dist-info and foobar-2.0.distinfo from existing side by side in a build directory (or inside a wheel for that matter). Thus I think this naming scheme is a nuisance and we shouldn’t propagate it any further. I would just use something like DIST-INFO/ which will completely side step this issue. The only reason I can think of to use the current scheme is to make it easier to shutil.copytree it into the wheel, but handling that case is trivial.
The rationale for this is to leave the door open to in the future allowing the same sdist to build multiple wheels. Obviously that would require a whole 'nother PEP, but I keep running into cases where this is a blocker so I think it will happen eventually, and in any case don't want to make new barriers...
For get_wheel_metadata() in particular there are several options though... we could call it DIST-INFO/ and then later declare that DIST-INFO2/, DIST-INFO3/, etc. are also valid and pip will look at all of them.
{package-name}.dist-info might also be reasonable, both here and in actual installs...
In general get_wheel_metadata is an optimization for the backtracking resolver (that doesn't exist yet) to more efficiently handle cases where there are complex constraints and no built wheels (both of which we hope will be rare). Robert thinks it's very important, and he knows more about that bit of the code than I do, but if it becomes an issue we could even defer get_wheel_metadata to a future PEP.
- As I mentioned in previous discussions on this, I think that this
interface *needs* some mechanism to ask it to build a sdist. Ideally this would either fail or recreate the sdist when being run from something that is already an sdist. In pip when we’re doing ``pip install .`` we currently copy tree the entire thing into a temporary directory, and we run the build out of there. For a variety of reasons we’re going to keep build isolation, but the current mechanism is painful because it also grabs things like .tox, .git, etc. At one point we tried to simply filter those out but it broke some packages expectations. The planned (which is on my long list of things to do…) mechanism to fix this is to create a sdist from a ``.`` and then install from that (building an intermediate wheel as well). This also solves another class of bugs that people run into where ``pip install .`` and ``python setup.py sdist && pip install dist/*`` give different results. As written, this PEP prevents that from happening (and thus, when I implement it, I’ll only be able to implement it for old style sdists, and will need to tell people to continue to use old style if they don’t want pip to grab random directories from ``.``).
Other than that, it looks fine, and #2 is the one that I think is going to be the bigger issue in pip.
I think there's some pip bug somewhere discussing this, where Ralf Gommers and I point out that this is a complete showstopper for projects with complex and expensive builds (like scipy). If 'pip install .' is going to replace 'setup.py install', then it needs to support incremental builds, and the way setup.py-and-almost-every-other-build-tool do this currently is by reusing the working directory across builds.
I think there's some hope for making both of us happy, though, because our concern is mostly about the install-this-potentially-dirty-working-directory case, and your concern is mostly about the build-this-package-for-release case (right)?
Right now pip doesn't really have a good way to expressing the latter. 'pip install directory/' is relatively unambiguously saying that I want a local install of some potentially-locally-modified files, and while it might involve a temporary wheel internally there's no need to expose this in any way (and e.g. it certainly shouldn't be cached), so I think it's OK if this builds in-place and risks giving different results than 'pip install sdist.tar.gz'. (Also note that in the most common case where a naive user might use this accidentally, where they've downloaded an sdist, unpacked it manually, and then run 'pip install .', they *do* give the same results -- the potential for trouble only comes when someone runs 'pip install .' multiple times in the same directory.)
The other option would be to say that 'pip install .' is *not* the preferred way to build-and-install from a working directory, and there's some other command you have to use instead (maybe build-system-specific) if you want efficient builds. This seems unreasonably unfriendly to me, though; 'pip install .' is really the obvious thing, so it should also be the one you want for common usage.
OTOH the closest thing to a do-a-release-build command currently is 'pip wheel', but it actually has the wrong semantics for making release builds, because it downloads and builds wheels for the entire dependency chain. I don't care if 'pip wheel directory/' copies the directory or makes an sdist or what. If you want to build from an sdist here it's fine with me :-). But I'm also uncertain of the value, because 'pip wheel directory/' is a somewhat weird command to be running in the first place, and it's rare and heavy-weight enough that the shutil.copy() thing seems as good as anything.
And what seems like it's really missing is a command to like... generate an sdist, build it into a wheel, and then leave them both somewhere twine can see them. Maybe 'pip release .'? This would definitely require a build-an-sdist hook in the build backend, and to me it seems like the Right way to address your concern. In PEP 517 as currently written this is punted to build-backend-specific tooling, and I agree that that's unpleasant. But given that 'pip release' doesn't even exist yet, maybe it makes more sense to defer the build-an-sdist hook to a new PEP that we can write alongside 'pip release'?
It might also make sense to have a 'pip ci .' that makes a release build and installs it for testing...
-n

On Mon, May 29, 2017, at 08:05 AM, Nathaniel Smith wrote:
And what seems like it's really missing is a command to like... generate an sdist, build it into a wheel, and then leave them both somewhere twine can see them. Maybe 'pip release .'?
FWIW, we've just had a discussion over naming for similar functionality in flit. We've ended up going with 'flit build' to build sdist+wheel, and 'flit publish' to also upload them to PyPI.
Thomas

On 29 May 2017 at 08:05, Nathaniel Smith njs@pobox.com wrote:
Right now pip doesn't really have a good way to expressing the latter. 'pip install directory/' is relatively unambiguously saying that I want a local install of some potentially-locally-modified files, and while it might involve a temporary wheel internally there's no need to expose this in any way (and e.g. it certainly shouldn't be cached), so I think it's OK if this builds in-place and risks giving different results than 'pip install sdist.tar.gz'. (Also note that in the most common case where a naive user might use this accidentally, where they've downloaded an sdist, unpacked it manually, and then run 'pip install .', they *do* give the same results -- the potential for trouble only comes when someone runs 'pip install .' multiple times in the same directory.)
I think that the key thing here is that as things stand, pip needs a means to copy an existing "source tree", as efficiently as possible. For local directories (source checkouts, typically) there's a lot of clutter that isn't needed to replicate the "source tree" aspect of the directory - but we can't reliably determine what is clutter and what isn't.
Whether that copying is a good idea, in the face of the need to do incremental builds, is something of an open question - clearly we can't do something that closes the door on incremental builds, but equally the overhead of copying unwanted data is huge at the moment, and we can't ignore that.
Talking about a "build a sdist" operation brings a whole load of questions about whether there should be a sdist format, what about sdist 2.0, etc into the mix. So maybe we should avoid all that, and say that pip[1] needs a "copy a source tree" operation. Backends SHOULD implement that by skipping any unneeded files in the source tree,but can fall back to a simple copy if they wish. In fact, we could make the operation optional and have *pip* fall back to copying if necessary. It would then be a backend quality of implementation issue if builds are slow because multi-megabyte git trees get copied unnecessarily.
(This operation might also help tools like setuptools-scm that need git information to work - the backend could extract that information on a "copy" operation and pit it somewhere static for the build).
Paul
[1] As pip is currently the only frontend, "what pip needs right now" is the only non-theoretical indication we have of what frontends might need to do, so we should be cautious about dismissing this as "pip shouldn't work like this", IMO.

I think there's some pip bug somewhere discussing this ....
https://github.com/pypa/pip/issues/2195 https://github.com/pypa/pip/pull/3219 plus some long mailing list threads IIRC
On Mon, May 29, 2017 at 9:19 PM, Paul Moore p.f.moore@gmail.com wrote:
On 29 May 2017 at 08:05, Nathaniel Smith njs@pobox.com wrote:
Right now pip doesn't really have a good way to expressing the latter.
'pip install directory/' is relatively unambiguously saying that I
want a local install of some potentially-locally-modified files, and while it might involve a temporary wheel internally there's no need to expose this in any way (and e.g. it certainly shouldn't be cached), so I think it's OK if this builds in-place and risks giving different results than 'pip install sdist.tar.gz'. (Also note that in the most common case where a naive user might use this accidentally, where they've downloaded an sdist, unpacked it manually, and then run 'pip install .', they *do* give the same results -- the potential for trouble only comes when someone runs 'pip install .' multiple times in the same directory.)
I think that the key thing here is that as things stand, pip needs a means to copy an existing "source tree", as efficiently as possible. For local directories (source checkouts, typically) there's a lot of clutter that isn't needed to replicate the "source tree" aspect of the directory - but we can't reliably determine what is clutter and what isn't.
Whether that copying is a good idea, in the face of the need to do incremental builds, is something of an open question - clearly we can't do something that closes the door on incremental builds, but equally the overhead of copying unwanted data is huge at the moment, and we can't ignore that.
Talking about a "build a sdist" operation brings a whole load of questions about whether there should be a sdist format, what about sdist 2.0, etc into the mix. So maybe we should avoid all that, and say that pip[1] needs a "copy a source tree" operation. Backends SHOULD implement that by skipping any unneeded files in the source tree,but can fall back to a simple copy if they wish. In fact, we could make the operation optional and have *pip* fall back to copying if necessary. It would then be a backend quality of implementation issue if builds are slow because multi-megabyte git trees get copied unnecessarily.
Doesn't that just move the problem from pip to backends? It's still a choice between: (1) making no copy (good for in-place builds and also fine for pbr & co, but needs education or a "pip release" type command) (2) making a full copy like now including .git, .vagrant, etc. (super inefficient) (3) making an efficient copy (will likely still break pbr and setuptools-scm, *and* break in-place builds)
(This operation might also help tools like setuptools-scm that need
git information to work - the backend could extract that information on a "copy" operation and pit it somewhere static for the build).
If the backend can do it, so can pip right?
Ralf
Paul
[1] As pip is currently the only frontend, "what pip needs right now" is the only non-theoretical indication we have of what frontends might need to do, so we should be cautious about dismissing this as "pip shouldn't work like this", IMO. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 29 May 2017 at 11:46, Ralf Gommers ralf.gommers@gmail.com wrote:
Talking about a "build a sdist" operation brings a whole load of questions about whether there should be a sdist format, what about sdist 2.0, etc into the mix. So maybe we should avoid all that, and say that pip[1] needs a "copy a source tree" operation. Backends SHOULD implement that by skipping any unneeded files in the source tree,but can fall back to a simple copy if they wish. In fact, we could make the operation optional and have *pip* fall back to copying if necessary. It would then be a backend quality of implementation issue if builds are slow because multi-megabyte git trees get copied unnecessarily.
Doesn't that just move the problem from pip to backends? It's still a choice between: (1) making no copy (good for in-place builds and also fine for pbr & co, but needs education or a "pip release" type command) (2) making a full copy like now including .git, .vagrant, etc. (super inefficient) (3) making an efficient copy (will likely still break pbr and setuptools-scm, *and* break in-place builds)
Yes, that's precisely what it does. The point being that a backend is the only code that can know what files are needed for the build, and which ones can safely not be copied.
(This operation might also help tools like setuptools-scm that need git information to work - the backend could extract that information on a "copy" operation and pit it somewhere static for the build).
If the backend can do it, so can pip right?
No (see above). pip has to make the safe "copy everything" assumption, and that has a significant cost.
Paul

On Mon, May 29, 2017, at 10:19 AM, Paul Moore wrote:
I think that the key thing here is that as things stand, pip needs a means to copy an existing "source tree", as efficiently as possible. For local directories (source checkouts, typically) there's a lot of clutter that isn't needed to replicate the "source tree" aspect of the directory - but we can't reliably determine what is clutter and what isn't.
This sounds similar to the question we had in flit about making sdists - we don't want to reinvent MANIFEST.in, but we also don't want to include random clutter that might be present in the source tree.
The compromise we've gone for in flit is to ask the VCS which files it is tracking and only include those, on the grounds that it should be possible to treat a fresh clone/checkout of the repository as the source. But this probably wouldn't work for what pip wants, because there are two big limitations:
1. Flit can only build an sdist from a VCS checkout 2. Flit needs to know about the VCS you're using

On May 29, 2017, at 3:05 AM, Nathaniel Smith njs@pobox.com wrote:
I think there's some pip bug somewhere discussing this, where Ralf Gommers and I point out that this is a complete showstopper for projects with complex and expensive builds (like scipy). If 'pip install .' is going to replace 'setup.py install', then it needs to support incremental builds, and the way setup.py-and-almost-every-other-build-tool do this currently is by reusing the working directory across builds.
Wouldn’t supporting incremental builds the way ccache does work just fine? Have a per build tool cache directory somewhere that stores cached build output for each individual file keyed off a hash or something? (For that matter, if someone wants incremental rebuilds, couldn’t they just *use* ccache as their CC?).
— Donald Stufft

On Mon, May 29, 2017 at 7:26 AM, Donald Stufft donald@stufft.io wrote:
On May 29, 2017, at 3:05 AM, Nathaniel Smith njs@pobox.com wrote:
I think there's some pip bug somewhere discussing this, where Ralf Gommers and I point out that this is a complete showstopper for projects with complex and expensive builds (like scipy). If 'pip install .' is going to replace 'setup.py install', then it needs to support incremental builds, and the way setup.py-and-almost-every-other-build-tool do this currently is by reusing the working directory across builds.
Wouldn’t supporting incremental builds the way ccache does work just fine? Have a per build tool cache directory somewhere that stores cached build output for each individual file keyed off a hash or something? (For that matter, if someone wants incremental rebuilds, couldn’t they just *use* ccache as their CC?).
With a random numpy checkout on my laptop and a fully-primed ccache, some wall-clock timings:
no-op incremental build (python setup.py build): 1.186 seconds
python setup.py sdist: 3.213 seconds unpack resulting tarball: 0.136 seconds python setup.py build in unpacked tree: 7.696 seconds
So ccache makes the sdist-and-build a mere 10x slower than an in-place incremental build.
ccache is great, but it isn't magic. It can't make copying files faster (notice we're already 3x slower before we even start building!), it doesn't speed up linking, and you still need to spawn all those processes and hash all that source code instead of just making some stat() calls.
Also, this is on Linux. The numbers would look much worse on Windows, given that it generally has much higher overhead for unpacking tarballs and spawning lots of processes, and also given that ccache doesn't support MSVC!
Also also, notice elsewhere in the thread where Thomas notes that flit can't build an sdist from an unpacked sdist. It seems like 'pip install unpacked-sdist/' is an important use case to support...
-n

On May 29, 2017, at 3:09 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, May 29, 2017 at 7:26 AM, Donald Stufft <donald@stufft.io mailto:donald@stufft.io> wrote:
On May 29, 2017, at 3:05 AM, Nathaniel Smith njs@pobox.com wrote:
I think there's some pip bug somewhere discussing this, where Ralf Gommers and I point out that this is a complete showstopper for projects with complex and expensive builds (like scipy). If 'pip install .' is going to replace 'setup.py install', then it needs to support incremental builds, and the way setup.py-and-almost-every-other-build-tool do this currently is by reusing the working directory across builds.
Wouldn’t supporting incremental builds the way ccache does work just fine? Have a per build tool cache directory somewhere that stores cached build output for each individual file keyed off a hash or something? (For that matter, if someone wants incremental rebuilds, couldn’t they just *use* ccache as their CC?).
With a random numpy checkout on my laptop and a fully-primed ccache, some wall-clock timings:
no-op incremental build (python setup.py build): 1.186 seconds
python setup.py sdist: 3.213 seconds unpack resulting tarball: 0.136 seconds python setup.py build in unpacked tree: 7.696 seconds
So ccache makes the sdist-and-build a mere 10x slower than an in-place incremental build.
ccache is great, but it isn't magic. It can't make copying files faster (notice we're already 3x slower before we even start building!), it doesn't speed up linking, and you still need to spawn all those processes and hash all that source code instead of just making some stat() calls.
Also, this is on Linux. The numbers would look much worse on Windows, given that it generally has much higher overhead for unpacking tarballs and spawning lots of processes, and also given that ccache doesn't support MSVC!
To be honest, I’m not hardly going to feel particularly bad if one of the most compilation heavy packages that exist takes a whole 10 seconds to install from a VCS checkout. Particularly when I assume that the build tool can be even smarter here than ccache is able to be to reduce the setup.py build step back down to the no-op incremental build case.
I mean, unless numpy is doing something different, the default distutils incremental build stuff is incredibly dumb, it just stores the build output in a directory (by default it’s located in ./build/) and compares the mtime of a list of source files with the mtime of the target file, and if the sources files are newer, it recompiles it. If you replace mtime with blake2 (or similar) then you can trivially support the exact same thing just storing the built target files in some user directory cache instead. Hell, we *might* even be able to preserve mtime (if we’re not already… we might be! But I’d need to dig into it) so literally the only thing that would need to change is instead of storing the built artifacts in ./build/ you store them in ~/.cache/my-cool-build-tool/{project-name}. Bonus points: this means you get incremental speeds even when building from a sdist from PyPI that doesn’t have wheels and hasn’t changed those files either.
I’m of the opinion that first you need to make it *correct*, then you can try to make it *fast*. It is my opinion that a installer that shits random debris into your current directory is not correct. It’s kind of silly that we have to have a “random pip/distutils/setuptools” crap chunk of stuff to add to .gitignore to basically every Python package in existence. Nevermind the random stuff that doesn’t currently get written there, but will if we stop copying files out of the path and into a temporary location (I’m sure everyone wants a pip-egg-info directory in their current directory).
I’m also of the opinion that avoiding foot guns is more important than shooting for the fastest operation possible. I regularly (sometimes multiple times a week!, but often every week or two) see people tripping up on the fact that ``git clone … && pip install .`` does something different than ``git clone … && python setup.py sdist && pip install dist/*``. Files suddenly go missing and they have no idea why. If they’re lucky, they’ll figure out they need to modify some combination of package_data, data_files, and MANIFEST.in to make it work, if they’re not lucky they just sit there dumbfounded at it.
Also also, notice elsewhere in the thread where Thomas notes that flit can't build an sdist from an unpacked sdist. It seems like 'pip install unpacked-sdist/' is an important use case to support…
If the build tool gives us a mechanism to determine if something is an unpacked sdist or not so we can fallback to just copying in that case, that is fine with me. The bad case is generally only going to be hit on VCS checkouts or other not sdist kinds of source trees.
— Donald Stufft

On Mon, May 29, 2017 at 12:50 PM, Donald Stufft donald@stufft.io wrote:
To be honest, I’m not hardly going to feel particularly bad if one of the most compilation heavy packages that exist takes a whole 10 seconds to install from a VCS checkout.
Rebuild latency is *really* important. People get really cranky at me when I argue that we should get rid of "editable installs", which create much greater problems for . I think I'm entitled to be cranky
Particularly when I assume that the build tool can be even smarter here than ccache is able to be to reduce the setup.py build step back down to the no-op incremental build case.
I mean, unless numpy is doing something different, the default distutils incremental build stuff is incredibly dumb, it just stores the build output in a directory (by default it’s located in ./build/) and compares the mtime of a list of source files with the mtime of the target file, and if the sources files are newer, it recompiles it. If you replace mtime with blake2 (or similar) then you can trivially support the exact same thing just storing the built target files in some user directory cache instead.
Hell, we *might* even be able to preserve mtime (if we’re not already… we might be! But I’d need to dig into it) so literally the only thing that would need to change is instead of storing the built artifacts in ./build/ you store them in ~/.cache/my-cool-build-tool/{project-name}. Bonus points: this means you get incremental speeds even when building from a sdist from PyPI that doesn’t have wheels and hasn’t changed those files either.
I’m of the opinion that first you need to make it *correct*, then you can try to make it *fast*. It is my opinion that a installer that shits random debris into your current directory is not correct. It’s kind of silly that we have to have a “random pip/distutils/setuptools” crap chunk of stuff to add to .gitignore to basically every Python package in existence. Nevermind the random stuff that doesn’t currently get written there, but will if we stop copying files out of the path and into a temporary location (I’m sure everyone wants a pip-egg-info directory in their current directory).
I’m also of the opinion that avoiding foot guns is more important than shooting for the fastest operation possible. I regularly (sometimes multiple times a week!, but often every week or two) see people tripping up on the fact that ``git clone … && pip install .`` does something different than ``git clone … && python setup.py sdist && pip install dist/*``. Files suddenly go missing and they have no idea why. If they’re lucky, they’ll figure out they need to modify some combination of package_data, data_files, and MANIFEST.in to make it work, if they’re not lucky they just sit there dumbfounded at it.
Also also, notice elsewhere in the thread where Thomas notes that flit can't build an sdist from an unpacked sdist. It seems like 'pip install unpacked-sdist/' is an important use case to support…
If the build tool gives us a mechanism to determine if something is an unpacked sdist or not so we can fallback to just copying in that case, that is fine with me. The bad case is generally only going to be hit on VCS checkouts or other not sdist kinds of source trees.
— Donald Stufft

Ugh, sorry, fat-fingered that. Actual reply below...
On Mon, May 29, 2017 at 12:56 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, May 29, 2017 at 12:50 PM, Donald Stufft donald@stufft.io wrote:
To be honest, I’m not hardly going to feel particularly bad if one of the most compilation heavy packages that exist takes a whole 10 seconds to install from a VCS checkout.
Rebuild latency is *really* important. People get really cranky at me when I argue that we should get rid of "editable installs", which create much greater problems for maintaining consistent environments, and that's only saving like 1 second of latency. I think I'm entitled to be cranky if your response is "well suck it up and maybe rewrite all your build tools".
NumPy really isn't that compilation heavy either... it's all C, which is pretty quick. SciPy is *much* slower, for example, as is pretty much any project using C++.
Particularly when I assume that the build tool can be even smarter here than ccache is able to be to reduce the setup.py build step back down to the no-op incremental build case.
I mean, unless numpy is doing something different, the default distutils incremental build stuff is incredibly dumb, it just stores the build output in a directory (by default it’s located in ./build/) and compares the mtime of a list of source files with the mtime of the target file, and if the sources files are newer, it recompiles it. If you replace mtime with blake2 (or similar) then you can trivially support the exact same thing just storing the built target files in some user directory cache instead.
Cache management is not a trivial problem.
And it actually doesn't matter, because we definitely can't silently dump stuff into some user directory. An important feature of storing temporary artifacts in the source tree is that it means that if someone downloads the source, plays around with it a bit, and deletes it, then it's actually gone. We can't squirrel away a few hundred megabytes of data in some hidden directory that will hang around for years after the user stops using numpy.
Hell, we *might* even be able to preserve mtime (if we’re not already… we might be! But I’d need to dig into it) so literally the only thing that would need to change is instead of storing the built artifacts in ./build/ you store them in ~/.cache/my-cool-build-tool/{project-name}. Bonus points: this means you get incremental speeds even when building from a sdist from PyPI that doesn’t have wheels and hasn’t changed those files either.
I’m of the opinion that first you need to make it *correct*, then you can try to make it *fast*. It is my opinion that a installer that shits random debris into your current directory is not correct. It’s kind of silly that we have to have a “random pip/distutils/setuptools” crap chunk of stuff to add to .gitignore to basically every Python package in existence. Nevermind the random stuff that doesn’t currently get written there, but will if we stop copying files out of the path and into a temporary location (I’m sure everyone wants a pip-egg-info directory in their current directory).
I’m also of the opinion that avoiding foot guns is more important than shooting for the fastest operation possible. I regularly (sometimes multiple times a week!, but often every week or two) see people tripping up on the fact that ``git clone … && pip install .`` does something different than ``git clone … && python setup.py sdist && pip install dist/*``. Files suddenly go missing and they have no idea why. If they’re lucky, they’ll figure out they need to modify some combination of package_data, data_files, and MANIFEST.in to make it work, if they’re not lucky they just sit there dumbfounded at it.
Yeah, setuptools is kinda sucky this way. But this is fixable with better build systems. And before we can get better build systems, we need buy-in from devs. And saying "sorry, we're unilaterally screwing up your recompile times because we don't care" is not a good way to get there :-(
Also also, notice elsewhere in the thread where Thomas notes that flit can't build an sdist from an unpacked sdist. It seems like 'pip install unpacked-sdist/' is an important use case to support…
If the build tool gives us a mechanism to determine if something is an unpacked sdist or not so we can fallback to just copying in that case, that is fine with me. The bad case is generally only going to be hit on VCS checkouts or other not sdist kinds of source trees.
I guess numpy could just claim that all VCS checkouts are actually unpacked sdists...?
-n

On 29 May 2017 at 21:04, Nathaniel Smith njs@pobox.com wrote:
I guess numpy could just claim that all VCS checkouts are actually unpacked sdists...?
Well, my proposal was to allow backends to decide what needs to be copied. So yes, you could do that under what I suggested (with the co-operation of the build backend). Legacy setuptools builds wouldn't provide a "this is what to copy" API, so we'd fall back to copy everything (or maybe "build sdist", but we could add a get-out clause in one of the config files, that said we're not allowed to do that on a per-project basis).
Paul

On May 29, 2017, at 4:04 PM, Nathaniel Smith njs@pobox.com wrote:
Ugh, sorry, fat-fingered that. Actual reply below...
On Mon, May 29, 2017 at 12:56 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, May 29, 2017 at 12:50 PM, Donald Stufft donald@stufft.io wrote:
To be honest, I’m not hardly going to feel particularly bad if one of the most compilation heavy packages that exist takes a whole 10 seconds to install from a VCS checkout.
Rebuild latency is *really* important. People get really cranky at me when I argue that we should get rid of "editable installs", which create much greater problems for maintaining consistent environments, and that's only saving like 1 second of latency. I think I'm entitled to be cranky if your response is "well suck it up and maybe rewrite all your build tools”.
Well, distutils literally already has support for storing the “cache" someplace other than the current directory, the current directory is just the default. So “rewrite all your build tools” is fairly hyperbolic, it’s really just “change the default of your build tools”. See for example: https://gist.github.com/dstufft/a577c3c9d54a3bb3b88e9b20ba86c625 https://gist.github.com/dstufft/a577c3c9d54a3bb3b88e9b20ba86c625 which shows that Numpy &tc are already capable of this.
Hell, the build backend could create an unpacked sdist in the target directory instead of an actual sdist that is already packed into a tarball, tools like twine could add a ``twine sdist`` command that just called the “create unpacked sdist” API and then just tar’d up the directory into the sdist. A quick rudimentary test on my machine (using ``python setup.py sdist —formats=`` in a numpy checkout [1]) suggests that this entire process takes ~0.7s which the copy operation on that same check out (shutil.copytree) also takes ~0.7s. That also eliminates the need to untar so unless someone is doing something in their sdist creation step that takes a significant amount of time, generating an unpacked sdist is really not any more time consuming than copying the files.
NumPy really isn't that compilation heavy either... it's all C, which is pretty quick. SciPy is *much* slower, for example, as is pretty much any project using C++.
Particularly when I assume that the build tool can be even smarter here than ccache is able to be to reduce the setup.py build step back down to the no-op incremental build case.
I mean, unless numpy is doing something different, the default distutils incremental build stuff is incredibly dumb, it just stores the build output in a directory (by default it’s located in ./build/) and compares the mtime of a list of source files with the mtime of the target file, and if the sources files are newer, it recompiles it. If you replace mtime with blake2 (or similar) then you can trivially support the exact same thing just storing the built target files in some user directory cache instead.
Cache management is not a trivial problem.
And it actually doesn't matter, because we definitely can't silently dump stuff into some user directory. An important feature of storing temporary artifacts in the source tree is that it means that if someone downloads the source, plays around with it a bit, and deletes it, then it's actually gone. We can't squirrel away a few hundred megabytes of data in some hidden directory that will hang around for years after the user stops using numpy.
I mean, you absolutely can do that. We store temporary wheels and HTTP responses silently in pip and have for years. I don’t think *anyone* has *ever* complained about it. I think macOS even explicitly will clean up stuff from ~/Library/Caches when it hasn’t been used in awhile. If you use the standard cache locations for Linux then IIRC similar systems exist for Linux too. In exchange for “I can delete the directory and it’s just all gone”, you get “faster builds in more scenarios, including straight from PyPI’s sdists”. If I were a user I’d care a lot more about the second then the first.
But even if I grant you that you can’t just do that silently, then go ahead and make it opt in. For people who need it, a simple boolean in a config file seems to be pretty low cost to me.
Combine this user cache with generating an unpacked sdist instead of copying the directory tree, and you get:
1) Safety from weirdness that comes from ``pip install`` from a sdist versus a VCS. 2) Not crapping up ./ with random debris from the installation process. 3) Fast incremental builds that even help speed up installs from PyPI etc (assuming we use something like blake2 to compute hashes for the files).
And you lose:
1) Deleting a clone doesn’t delete the cache directory, but your OS might already be managing this directory anyways.
Seems like an obvious trade off to me.
Hell, we *might* even be able to preserve mtime (if we’re not already… we might be! But I’d need to dig into it) so literally the only thing that would need to change is instead of storing the built artifacts in ./build/ you store them in ~/.cache/my-cool-build-tool/{project-name}. Bonus points: this means you get incremental speeds even when building from a sdist from PyPI that doesn’t have wheels and hasn’t changed those files either.
I’m of the opinion that first you need to make it *correct*, then you can try to make it *fast*. It is my opinion that a installer that shits random debris into your current directory is not correct. It’s kind of silly that we have to have a “random pip/distutils/setuptools” crap chunk of stuff to add to .gitignore to basically every Python package in existence. Nevermind the random stuff that doesn’t currently get written there, but will if we stop copying files out of the path and into a temporary location (I’m sure everyone wants a pip-egg-info directory in their current directory).
I’m also of the opinion that avoiding foot guns is more important than shooting for the fastest operation possible. I regularly (sometimes multiple times a week!, but often every week or two) see people tripping up on the fact that ``git clone … && pip install .`` does something different than ``git clone … && python setup.py sdist && pip install dist/*``. Files suddenly go missing and they have no idea why. If they’re lucky, they’ll figure out they need to modify some combination of package_data, data_files, and MANIFEST.in to make it work, if they’re not lucky they just sit there dumbfounded at it.
Yeah, setuptools is kinda sucky this way. But this is fixable with better build systems. And before we can get better build systems, we need buy-in from devs. And saying "sorry, we're unilaterally screwing up your recompile times because we don't care" is not a good way to get there :-(
I don’t think it has anything to do with setuptools TBH other than the fact that it’s interface for declaring what does and doesn’t go into a sdist is kind of crummy. This problem is going to exist as long as you have any mechanism for having some files not be included inside of a sdist.
Also also, notice elsewhere in the thread where Thomas notes that flit can't build an sdist from an unpacked sdist. It seems like 'pip install unpacked-sdist/' is an important use case to support…
If the build tool gives us a mechanism to determine if something is an unpacked sdist or not so we can fallback to just copying in that case, that is fine with me. The bad case is generally only going to be hit on VCS checkouts or other not sdist kinds of source trees.
I guess numpy could just claim that all VCS checkouts are actually unpacked sdists…?
I mean, ``pip install .`` is still going to ``cp -r`` that VCS checkout into a temporary location if you do that, and making sure that the invariant of ``python setup.py build && pip install .`` doesn’t trigger a recompile isn’t going to be something that I would want pip to start doing. So would it _work_ for this use case? Possibly? Is it supported? Nope, if it breaks you get to keep both pieces.
— Donald Stufft

On May 29, 2017, at 4:48 PM, Donald Stufft donald@stufft.io wrote:
A quick rudimentary test on my machine (using ``python setup.py sdist —formats=`` in a numpy checkout [1]) suggests that this entire process takes ~0.7s which the copy operation on that same check out (shutil.copytree) also takes ~0.7s.
I forgot to add the [1] here, but it’s basically that `python setup.py sdist —formats=` actually errors out, because an empty str is not a valid format, but it errors out *after* distutils has prepped an unpacked directory and the only things left are (A) taring up that unpacked directory into the desired formats and (B) deleting that unpacked directory. Neither of which we want to do in this case.
— Donald Stufft

On Mon, May 29, 2017 at 1:04 PM, Nathaniel Smith njs@pobox.com wrote:
Ugh, sorry, fat-fingered that. Actual reply below...
On Mon, May 29, 2017 at 12:56 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, May 29, 2017 at 12:50 PM, Donald Stufft donald@stufft.io wrote:
To be honest, I’m not hardly going to feel particularly bad if one of the most compilation heavy packages that exist takes a whole 10 seconds to install from a VCS checkout.
Rebuild latency is *really* important. People get really cranky at me when I argue that we should get rid of "editable installs", which create much greater problems for maintaining consistent environments, and that's only saving like 1 second of latency.
I don't recall this being discussed. Is support for editable installs being considered for removal? Either way, what is the argument out of curiosity?
--Chris

On 29 May 2017 at 20:09, Nathaniel Smith njs@pobox.com wrote:
With a random numpy checkout on my laptop and a fully-primed ccache, some wall-clock timings:
no-op incremental build (python setup.py build): 1.186 seconds
python setup.py sdist: 3.213 seconds unpack resulting tarball: 0.136 seconds python setup.py build in unpacked tree: 7.696 seconds
So ccache makes the sdist-and-build a mere 10x slower than an in-place incremental build.
These numbers are useful to know, but ignore the fact that the two operations could give completely different results (because files in the current directory that don't get put into the sdist could affect the build). I have no particular opinion on whether "do an inplace build" is something pip should offer (although it seems like something a developer would need, and I don't see why "setup.py bdist_wheel" followed by "pip install the_wheel" isn't sufficient in that case[1]). But for end user installs, and for building wheels for distribution, I'd argue that isolated builds are essential.
Paul
[1] Of course, this ignores the case of editable builds, but again, I don't think that they should dictate the behaviour of the normal case.

On May 29, 2017, at 3:05 AM, Nathaniel Smith njs@pobox.com wrote:
{package-name}.dist-info might also be reasonable, both here and in actual installs...
In general get_wheel_metadata is an optimization for the backtracking resolver (that doesn't exist yet) to more efficiently handle cases where there are complex constraints and no built wheels (both of which we hope will be rare). Robert thinks it's very important, and he knows more about that bit of the code than I do, but if it becomes an issue we could even defer get_wheel_metadata to a future PEP.
I’d have to think about just {package-name}.dist-info. That would work in the install case (though that would require a whole other PEP so not super relevant to this PEP). The ultimate question is, when pip goes looking for this directory, does it know what it’s looking for or not. If it doesn’t know what it’s looking for, then it has to resort to looking for everything that matches a pattern, and if more than one thing matches said pattern, it has no idea which one it is supposed to use, and it just has to error out.
Perhaps either just using DIST-INFO today, and if we ever support multiple wheels from a single sdist expose a new API that can handle multiple directories and returns the name of said directories. The other option would be to modify get_wheel_data so it returns the name of the dist-info directory that was created, so pip knows what directory it needs to look for.
We’re hopefully going to get a resolver over this summer, we have a GSoC student who is working on just that.
— Donald Stufft

Hi,
On 29 May 2017 at 04:05, Nathaniel Smith njs@pobox.com wrote:
On Sun, May 28, 2017 at 10:37 PM, Donald Stufft donald@stufft.io wrote:
[...]
- Using the {package-name}-{package-version}.dist-info in the
get_wheel_metadata() metadata is a mistake I think. In pip currently we
have
a bug we have not yet been able to track down because there is nothing systematically preventing both foobar-1.0.dist-info and
foobar-2.0.distinfo
from existing side by side in a build directory (or inside a wheel for
that
matter). Thus I think this naming scheme is a nuisance and we shouldn’t propagate it any further. I would just use something like DIST-INFO/
which
will completely side step this issue. The only reason I can think of to
use
the current scheme is to make it easier to shutil.copytree it into the wheel, but handling that case is trivial.
The rationale for this is to leave the door open to in the future allowing the same sdist to build multiple wheels. [...]
For get_wheel_metadata() in particular there are several options though... we could call it DIST-INFO/ and then later declare that DIST-INFO2/, DIST-INFO3/, etc. are also valid and pip will look at all of them.
{package-name}.dist-info might also be reasonable, both here and in actual installs... [...]
Wasn't there a thread here some time ago about switching from: - `{package-name}-{package-version}.dist-info`
to:
- `{package-name}.dist-info`
in all tools?
(while accepting the old format, for old, already built or already installed wheels, of course)
If I remember correctly, there wasn't any complaints about doing this (multi-version installs where a setuptools/egg-info thing anyway and wouldn't be affected by this), and it's perfectly compatible with building multiple wheels from a single source package.
What was needed to move forward with this change?
Regards,
Leo

On 29 May 2017 at 15:37, Donald Stufft donald@stufft.io wrote:
On May 28, 2017, at 11:40 PM, Nick Coghlan ncoghlan@gmail.com wrote:
Hi folks,
I am hereby provisionally accepting PEP 517 as our CLI independent interface for build backends, allowing interested parties to move forward with implementing it as part of pip and other tools.
As with other provisionally accepted PyPA interoperability specifications, it won't be considered completely settled until it has been implemented and released in pip and we have at least one iteration of end user feedback.
Thanks to Nathaniel & Thomas for their work on the PEP, and to everyone that participated in the related discussions!
Bleh, I had it on my stack to respond to PEP 517, but life has been super hectic so I hadn’t gotten around to it.
Here are a few thoughts FWIW:
- Using the {package-name}-{package-version}.dist-info in the
get_wheel_metadata() metadata is a mistake I think. In pip currently we have a bug we have not yet been able to track down because there is nothing systematically preventing both foobar-1.0.dist-info and foobar-2.0.distinfo from existing side by side in a build directory (or inside a wheel for that matter). Thus I think this naming scheme is a nuisance and we shouldn’t propagate it any further. I would just use something like DIST-INFO/ which will completely side step this issue. The only reason I can think of to use the current scheme is to make it easier to shutil.copytree it into the wheel, but handling that case is trivial.
I think this is an ambiguity in the current spec, since the name of the metadata directory is wholly controlled by the front-end, but I now see that "This directory MUST be a valid .dist-info directory as defined in the wheel specification, except that it need not contain RECORD or signatures." can be taken as suggesting it is required to follow the directory naming scheme and hence backends can parse it accordingly to find out the name of the wheel being built.
That's not the way I actually read it when accepting the PEP, so I think it should be clarified to say:
* the *name* of the directory is arbitrary, and entirely up to the build frontend. Build backends MUST NOT attempt to parse it for information. * the *contents* of the directory, as produced by the build backend, MUST be as defined in the wheel specification (only omitting RECORD and signatures)
If we want to later introduce an RPM style "multiple wheels from a single sdist" model, that should be its own PEP, with its own additions to pyproject.toml to define the available alternate wheel files (so that tools like pyp2rpm and conda skeleton can read it and adjust their output accordingly) and to the build backend API (to request building an alternate wheel rather than the default one that's named after the sdist).
- As I mentioned in previous discussions on this, I think that this
interface *needs* some mechanism to ask it to build a sdist. Ideally this would either fail or recreate the sdist when being run from something that is already an sdist. In pip when we’re doing ``pip install .`` we currently copy tree the entire thing into a temporary directory, and we run the build out of there. For a variety of reasons we’re going to keep build isolation, but the current mechanism is painful because it also grabs things like .tox, .git, etc. At one point we tried to simply filter those out but it broke some packages expectations. The planned (which is on my long list of things to do…) mechanism to fix this is to create a sdist from a ``.`` and then install from that (building an intermediate wheel as well). This also solves another class of bugs that people run into where ``pip install .`` and ``python setup.py sdist && pip install dist/*`` give different results. As written, this PEP prevents that from happening (and thus, when I implement it, I’ll only be able to implement it for old style sdists, and will need to tell people to continue to use old style if they don’t want pip to grab random directories from ``.``).
As with multiplexed wheel creation, I think "give me a filtered source tree to use as an out-of-tree build root" can be a separate follow-up PEP (I'll even propose a name for the API: "export_sdist_source_tree").
My rationale for seeing it that way, is that while PEP 517 requires that the current working directory correspond to the root of the source tree, it *doesn't* require that the source tree be the unfiltered contents of a VCS checkout. It can't, since the source tree might have come from an sdist, and MANIFEST.in and friends already allow that to be a filtered subset of the full VCS contents.
And when it comes to *publishing* an sdist, folks are free to call their build backend directly for that, rather than necessarily going through pip to do it.
That said, I *do* think it makes sense for pip to offer an option to use out-of-tree builds when running `pip install` (even if it isn't the default), and that means extending the build backend API to cover creating an exported sdist tree directly from a VCS checkout or other local directory without necessarily making the sdist first.
Cheers, Nick.

It can require that it is either unfiltered or an unpacked sdist since that is how a lot of build time projects treat it now. They handle a sdist differently from a vcs source, for example pbr. Swapping out a call to setup.py for an internal shim that calls a Python API doesn't change anything here, randomly filtering out some files from a vcs will break random projects. We ether only suppprt copying the whole directory or we add support for something that enables this in the PEP. There is basically no middle ground that isn't worse than one of those two options for PEP 517 style packages.
I also don't think that creating an sdist should be an optional part of the build interface, but things added in later PEPs can only be added as optional, not mandatory. There is already automation that relies on handling sdist- for example the Travis deployment to PyPI code path- that will be unable to support this new standard without either ONLY supporting wheels, or needing to add custom code for each individual build tool (unlikely to happen). The effect of which will be that either people simply can't use this spec without it or we incentivize releasing only wheels to PyPI instead of wheels and a sdist.
I don't see anyway really for this PEP to move forward without support for sdists without causing major regressions.
Sent from my iPhone
On May 30, 2017, at 1:36 AM, Nick Coghlan ncoghlan@gmail.com wrote:
My rationale for seeing it that way, is that while PEP 517 requires that the current working directory correspond to the root of the source tree, it *doesn't* require that the source tree be the unfiltered contents of a VCS checkout. It can't, since the source tree might have come from an sdist, and MANIFEST.in and friends already allow that to be a filtered subset of the full VCS contents.

On 30 May 2017 at 17:07, Donald Stufft donald@stufft.io wrote:
It can require that it is either unfiltered or an unpacked sdist since that is how a lot of build time projects treat it now. They handle a sdist differently from a vcs source, for example pbr. Swapping out a call to setup.py for an internal shim that calls a Python API doesn't change anything here, randomly filtering out some files from a vcs will break random projects. We ether only suppprt copying the whole directory or we add support for something that enables this in the PEP. There is basically no middle ground that isn't worse than one of those two options for PEP 517 style packages.
I also don't think that creating an sdist should be an optional part of the build interface, but things added in later PEPs can only be added as optional, not mandatory. There is already automation that relies on handling sdist- for example the Travis deployment to PyPI code path- that will be unable to support this new standard without either ONLY supporting wheels, or needing to add custom code for each individual build tool (unlikely to happen). The effect of which will be that either people simply can't use this spec without it or we incentivize releasing only wheels to PyPI instead of wheels and a sdist.
I don't see anyway really for this PEP to move forward without support for sdists without causing major regressions.
Is your concern that there's no explicit statement in the PEP saying that build backends MUST NOT assume they will have access to version control metadata directories in the source tree, since that source tree may have come from an sdist rather than VCS checkout?
Aside from that possibility, I otherwise don't follow this chain of reasoning, as I don't see how PEP 517 has any impact whatsoever on the sdist build step.
Status quo:
- pip (et al) can use setup.py to build from an unfiltered source tree - pip (et al) can use setup.py to build from an sdist - creation of the sdist is up to the software publisher, so if pip or another frontend wants to do an out of tree build, it copies the entire unfiltered tree
Post PEP 517:
- pip (et al) can use setup.py to build from an unfiltered source tree - pip (et al) can use setup.py to build from an sdist - pip (et al) can use the pyproject.toml build-backend setting to build from an unfiltered source tree - pip (et al) can use the pyproject.toml build-backend setting to build from an sdist - creation of the sdist is still up to the software publisher, so if pip or another frontend wants to do an out of tree build, it still copies the entire unfiltered tree
Post a to-be-written sdist-source-tree-export PEP:
- pip (et al) can use setup.py to build from an unfiltered source tree - pip (et al) can use setup.py to build from an sdist - pip (et al) can use the pyproject.toml build-backend setting to build from an unfiltered source tree - pip (et al) can use the pyproject.toml build-backend setting to build from an sdist - pip (et al) can use a new pyproject.toml setting defined in that PEP ("source-filter" perhaps?) to export a filtered version of a source tree, otherwise they fall back on copying the entire unfiltered tree (similar to the way build-backend falls back to setuptools & wheel if not otherwise specified)
That approach would decouple the export backends from the build backends, so we might even end up with a common VCS-aware source exporter that projects could nominate (e.g. by adding this functionality to twine), rather than every build backend having to define its own source export logic.
Note that I'm also fine with pip as a project saying that it will only ship support for the build-backend interface once the source filtering interface is also defined and implemented.
I'm just saying that I don't see a close enough link between "here is how to build this component from source" and "here is how to export a filtered source tree for this component from an unfiltered VCS checkout" for it to make sense to define them as part of the same backend API. The only connection I'm aware of is that it makes sense for projects to ensure that their source filtering when creating an sdist isn't leaving out any files needed by their build process, but that's no different from making sure that your build process produces a wheel file that actually works when installed.
Cheers, Nick.

On May 30, 2017, at 6:34 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 30 May 2017 at 17:07, Donald Stufft donald@stufft.io wrote:
It can require that it is either unfiltered or an unpacked sdist since that is how a lot of build time projects treat it now. They handle a sdist differently from a vcs source, for example pbr. Swapping out a call to setup.py for an internal shim that calls a Python API doesn't change anything here, randomly filtering out some files from a vcs will break random projects. We ether only suppprt copying the whole directory or we add support for something that enables this in the PEP. There is basically no middle ground that isn't worse than one of those two options for PEP 517 style packages.
I also don't think that creating an sdist should be an optional part of the build interface, but things added in later PEPs can only be added as optional, not mandatory. There is already automation that relies on handling sdist- for example the Travis deployment to PyPI code path- that will be unable to support this new standard without either ONLY supporting wheels, or needing to add custom code for each individual build tool (unlikely to happen). The effect of which will be that either people simply can't use this spec without it or we incentivize releasing only wheels to PyPI instead of wheels and a sdist.
I don't see anyway really for this PEP to move forward without support for sdists without causing major regressions.
Is your concern that there's no explicit statement in the PEP saying that build backends MUST NOT assume they will have access to version control metadata directories in the source tree, since that source tree may have come from an sdist rather than VCS checkout?
Aside from that possibility, I otherwise don't follow this chain of reasoning, as I don't see how PEP 517 has any impact whatsoever on the sdist build step.
My concern is that it’s literally impossible for the most common tooling outside of setuptools that acts at the build stage to function if we just randomly start filtering out files. For example let’s take a look at setuptools_scm, it assumes that one of two cases are true:
1) I am in a VCS checkout where I can run ``git`` commands to compute my version number, as well as what files should be added to the sdist, installed, etc. 2) I am in a sdist where the above information was “baked” into the sdist at sdist creation time and thus no longer requires access to the .git/ directory.
Those are the only two situations where it works. The “bad” case for performance reasons comes from the fact that a VCS checkout often times has A LOT of files that don’t need to be copied over typically, but that we do because when we attempted to filter things out previously it broke. These files can make the entire ``pip install .`` take over a minute on a slow hard drive. One of the biggest offenders is .tox/, but another big offender is .git. Another common offender is large chunks of demo data that doesn’t get added to the sdist.
Status quo:
- pip (et al) can use setup.py to build from an unfiltered source tree
- pip (et al) can use setup.py to build from an sdist
- creation of the sdist is up to the software publisher, so if pip or
another frontend wants to do an out of tree build, it copies the entire unfiltered tree
Post PEP 517:
- pip (et al) can use setup.py to build from an unfiltered source tree
- pip (et al) can use setup.py to build from an sdist
- pip (et al) can use the pyproject.toml build-backend setting to
build from an unfiltered source tree
- pip (et al) can use the pyproject.toml build-backend setting to
build from an sdist
- creation of the sdist is still up to the software publisher, so if
pip or another frontend wants to do an out of tree build, it still copies the entire unfiltered tree
Status quo is also that Travis CI, Gem Fury, etc can produce and upload a sdist using ``setup.py sdist``. Pip is not the only consumer of setup.py that needs to be able to operate on a VCS and do things with it. Ignoring this just means that we solve the problem of standardizing access for pip’s current use case, and tell these other use cases to go pound sand.
Post a to-be-written sdist-source-tree-export PEP:
- pip (et al) can use setup.py to build from an unfiltered source tree
- pip (et al) can use setup.py to build from an sdist
- pip (et al) can use the pyproject.toml build-backend setting to
build from an unfiltered source tree
- pip (et al) can use the pyproject.toml build-backend setting to
build from an sdist
- pip (et al) can use a new pyproject.toml setting defined in that PEP
("source-filter" perhaps?) to export a filtered version of a source tree, otherwise they fall back on copying the entire unfiltered tree (similar to the way build-backend falls back to setuptools & wheel if not otherwise specified)
That approach would decouple the export backends from the build backends, so we might even end up with a common VCS-aware source exporter that projects could nominate (e.g. by adding this functionality to twine), rather than every build backend having to define its own source export logic.
Note that I'm also fine with pip as a project saying that it will only ship support for the build-backend interface once the source filtering interface is also defined and implemented.
I'm just saying that I don't see a close enough link between "here is how to build this component from source" and "here is how to export a filtered source tree for this component from an unfiltered VCS checkout" for it to make sense to define them as part of the same backend API. The only connection I'm aware of is that it makes sense for projects to ensure that their source filtering when creating an sdist isn't leaving out any files needed by their build process, but that's no different from making sure that your build process produces a wheel file that actually works when installed.
I don’t think there is any value in decoupling the generation of what goes into an sdist from the tool that builds them. If we did that, I suspect that 100% of the time the exact same tool is going to be used to handle both anyways (or people just won’t bother to put in the extra effort to produce sdists). I think trying to split them up serves only to make the entire toolchain harder and more complicated for people who aren’t stepped in packaging lore to figure out what goes where and what does what. The fact that we have different mechanisms just to control what goes into a sdist (MANIFEST.in) vs what gets installed (package_data) already confuses people, further splitting these two steps apart is only going to make that worse.
Keeping the two together also completely sidesteps the problems around “well what if only the sdist tool is defined but the build tool isn’t?” Or “what if only the build tool is defined but the sdist tool isn’t?”.
The only value I can even think of, is that some of the code is going to be re-usable, but we already have a perfectly serviceable way of allowing code re-use: publish a library and have end tools consume it.
— Donald Stufft

On May 30, 2017, at 7:37 AM, Donald Stufft donald@stufft.io wrote:
Post a to-be-written sdist-source-tree-export PEP:
- pip (et al) can use setup.py to build from an unfiltered source tree
- pip (et al) can use setup.py to build from an sdist
- pip (et al) can use the pyproject.toml build-backend setting to
build from an unfiltered source tree
- pip (et al) can use the pyproject.toml build-backend setting to
build from an sdist
- pip (et al) can use a new pyproject.toml setting defined in that PEP
("source-filter" perhaps?) to export a filtered version of a source tree, otherwise they fall back on copying the entire unfiltered tree (similar to the way build-backend falls back to setuptools & wheel if not otherwise specified)
That approach would decouple the export backends from the build backends, so we might even end up with a common VCS-aware source exporter that projects could nominate (e.g. by adding this functionality to twine), rather than every build backend having to define its own source export logic.
Note that I'm also fine with pip as a project saying that it will only ship support for the build-backend interface once the source filtering interface is also defined and implemented.
I'm just saying that I don't see a close enough link between "here is how to build this component from source" and "here is how to export a filtered source tree for this component from an unfiltered VCS checkout" for it to make sense to define them as part of the same backend API. The only connection I'm aware of is that it makes sense for projects to ensure that their source filtering when creating an sdist isn't leaving out any files needed by their build process, but that's no different from making sure that your build process produces a wheel file that actually works when installed.
I don’t think there is any value in decoupling the generation of what goes into an sdist from the tool that builds them. If we did that, I suspect that 100% of the time the exact same tool is going to be used to handle both anyways (or people just won’t bother to put in the extra effort to produce sdists). I think trying to split them up serves only to make the entire toolchain harder and more complicated for people who aren’t stepped in packaging lore to figure out what goes where and what does what. The fact that we have different mechanisms just to control what goes into a sdist (MANIFEST.in) vs what gets installed (package_data) already confuses people, further splitting these two steps apart is only going to make that worse.
Keeping the two together also completely sidesteps the problems around “well what if only the sdist tool is defined but the build tool isn’t?” Or “what if only the build tool is defined but the sdist tool isn’t?”.
The only value I can even think of, is that some of the code is going to be re-usable, but we already have a perfectly serviceable way of allowing code re-use: publish a library and have end tools consume it.
I think my other thing here is that I don’t even think as written you *can* separate the two concepts. The reason being they’re both going to need the same metadata (like name, version, etc). If you want to have two separate tools handling then, that the build wheel step should accept *only* an sdist (unpacked or otherwise) as the “input” to this build tool and it should not accept a VCS input at all. The wheel build tool can then read the metadata like name, version, etc from whatever metadata exists in the hypothetical sdist format. Otherwise you’ll need two different to both implement the same scheme for determining version (since you’d be able to run both options from a VCS checkout).
Now I’m perfectly fine mandating that builds go VCS -> sdist -> wheel -> install [1], but that’s not what the current PEP does and making it do that would be significantly more effort since you’d have to spend more time hammering out what metadata is “sdist metadata” and what metadata is “Wheel metadata” and I really don’t think that particular rabbit hole is an effective use of our time.
[1] Indeed I think this being the only path (besides of course editable) is a overall good thing and I’d support it, but even if we did I still would think that separate tools for this is a waste of time which can also be added after the fact with a simple shim package if people do end up finding it very useful.
— Donald Stufft

I'm struggling to understand the objections here. As I understand the PEP, the input to building a wheel is a source tree. That may come from an unpacked sdist or a VCS checkout; hopefully those contain basically the same files, give or take some unimportant generated files in an sdist. This seems to work for building wheels with setup.py (as pip already does), and it's not a problem for flit. So why does pip need to know how to make an sdist? I understand the concern about copying many unnecessary files if the source tree is copied into a build directory, though I'm not entirely sure why we'd need to do that. I don't think the source tree is often so big as to be problematic, though. My clone of the IPython repository, with many years of active git history and a build of the Sphinx docs, is still under 50 MB. Most repos I work with are much smaller. If we define optional mechanisms to filter which files are copied, a fallback to copying the whole directory sounds acceptable. Thomas

On May 30, 2017, at 11:36 AM, Thomas Kluyver thomas@kluyver.me.uk wrote:
I'm struggling to understand the objections here. As I understand the PEP, the input to building a wheel is a source tree. That may come from an unpacked sdist or a VCS checkout; hopefully those contain basically the same files, give or take some unimportant generated files in an sdist.
I’m struggling to understand the objection to adding a mechanism for creating an unpacked sdist. Presumably all of the build tools are planning on supporting sdist creation, so the only real additional effort is to expose that interface which should be fairly minimal. The only argument I can think of against adding support for generating sdists is if the build tools don’t plan on implementing sdist creation, in which case I personally have no interest in supporting that build tool.
They quite often do *not* contain the same files, like the example I keep going back to which is a .git directory that can be used to compute the version in a VCS checkout, which no longer exists in a sdist (and instead a pre-baked version number exists). Another example is pyca/cryptography (https://github.com/pyca/cryptography https://github.com/pyca/cryptography) where the root of the package actually contains a whole other package inside of it (under the vectors directory) which does not exist in the sdist. The behavior of the setup.py changes based on whether the vectors directory exists or not (e.g. inside a tarball or a VCS directory).
This seems to work for building wheels with setup.py (as pip already does), and it's not a problem for flit. So why does pip need to know how to make an sdist?
Because it’s the most reasonable way to not hit fairly large slow downs on big repositories. You can see:
https://github.com/pypa/pip/issues/2195 https://github.com/pypa/pip/issues/2195
https://github.com/pypa/pip/pull/2196 https://github.com/pypa/pip/pull/2196
https://github.com/pypa/pip/pull/2535 https://github.com/pypa/pip/pull/2535 (the last comment on that one is nice— current behavior is taking over 5 minutes for pip to copy the data)
https://github.com/pypa/pip/pull/3176 https://github.com/pypa/pip/pull/3176
https://github.com/pypa/pip/pull/3615 https://github.com/pypa/pip/pull/3615
This is a long standing issue with pip that people hit with semi regularity— refusing to fix it is user hostile. Personally I don’t really have much interest in seeing something land in pip that prevents fixing issues that we’re currently seeing— the other pip devs may disagree with me, but as it stands I would be -1 on implementing this PEP as it stands without additional work (either in a stand alone PEP, or as part of this PEP, though I prefer as part of this PEP).
It’s also not just about pip, as I’ve mentioned there are other tooling that is relying on the ability to create a sdist, and refusing to accommodate them is something I am not thrilled about us doing. If every build tool is going to implement it’s own command to build a sdist, how is something like TravisCI supposed to handle building sdists automatically on push as they do currently? How is gem fury supposed to continue to build packages to host when you ``git push fury``? “Sorry you don’t get to or you have to handle every build tool directly” is not a answer I can support.
— Donald Stufft

What about saying that the copying step, if necessary, is part of the build backend's responsibilities? I.e. pip doesn't copy the whole directory to a temporary build location, but the build backend may decide to do that at its discretion when it's asked to build a wheel. pip would continue to handle this for setup.py builds.

On May 30, 2017, at 12:29 PM, Thomas Kluyver thomas@kluyver.me.uk wrote:
What about saying that the copying step, if necessary, is part of the build backend's responsibilities? I.e. pip doesn't copy the whole directory to a temporary build location, but the build backend may decide to do that at its discretion when it's asked to build a wheel. pip would continue to handle this for setup.py builds.
That still leaves the other use cases for building sdists unsatisfied. In addition it’ll likely be pip that gets the bug reports when some backend inevitably doesn’t copy those files and then leaves random debris laying about (including things like read only file systems where random crap *can’t* be written to the ``.`` directory or mounting the same package in multiple docker containers that would cause different things to be pooped into the ``.`` directory).
— Donald Stufft

Just to make sure I'm following this correctly, Donald is asking for:
- A way for pip to ask back-ends what files should be in an sdist from a source checkout or to make an actual sdist - Because sdists are a thing and so we should support them properly - To make it so building wheels interact with just the files from an sdist instead of skipping that and going straight from a source checkout - Have this be a part of PEP 517 or at least a requirement for back-ends to support so it doesn't get left out
Am I missing anything?
On Tue, 30 May 2017 at 09:48 Donald Stufft donald@stufft.io wrote:
On May 30, 2017, at 12:29 PM, Thomas Kluyver thomas@kluyver.me.uk wrote:
What about saying that the copying step, if necessary, is part of the build backend's responsibilities? I.e. pip doesn't copy the whole directory to a temporary build location, but the build backend may decide to do that at its discretion when it's asked to build a wheel. pip would continue to handle this for setup.py builds.
That still leaves the other use cases for building sdists unsatisfied. In addition it’ll likely be pip that gets the bug reports when some backend inevitably doesn’t copy those files and then leaves random debris laying about (including things like read only file systems where random crap *can’t* be written to the ``.`` directory or mounting the same package in multiple docker containers that would cause different things to be pooped into the ``.`` directory).
—
Donald Stufft _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 30 May 2017 at 19:17, Brett Cannon brett@python.org wrote:
Just to make sure I'm following this correctly, Donald is asking for:
A way for pip to ask back-ends what files should be in an sdist from a source checkout or to make an actual sdist
Because sdists are a thing and so we should support them properly To make it so building wheels interact with just the files from an sdist instead of skipping that and going straight from a source checkout
Have this be a part of PEP 517 or at least a requirement for back-ends to support so it doesn't get left out
Am I missing anything?
That's it basically. The only nuance is that there's a debate over the idea of a "sdist" - the term carries some baggage in terms of implying a need for a standard, static metadata, etc.
In this context, all we need is the list of files to include, so backends that don't want to buy into the idea of a "sdist" format yet (IIRC, flit may be in this position) can just say "these are the files" and not worry about the whole question of a standard sdist format.
Paul

On May 30, 2017, at 2:17 PM, Brett Cannon brett@python.org wrote:
Just to make sure I'm following this correctly, Donald is asking for: A way for pip to ask back-ends what files should be in an sdist from a source checkout or to make an actual sdist Because sdists are a thing and so we should support them properly To make it so building wheels interact with just the files from an sdist instead of skipping that and going straight from a source checkout Have this be a part of PEP 517 or at least a requirement for back-ends to support so it doesn't get left out Am I missing anything?
More or less. If I had my druthers we’d just add another mandatory method to the API, something like:
def build_sdist_tree(sdist_directory, config_settings) -> None: …
All this does is assemble the would be sdist tree into sdist_directory, handling any sdist creation time steps WITHOUT actually tar+gz’ing up this tree. My tests show that this is basically as fast as the copytree option in the simple case (which makes sense, that’s basically all it is) and is way faster than actually assembling the entire sdist tarball and all.
To handle creation, we can either have ``twine sdist`` or something like that which just calls the above API and then runs ``shutil.make_archive()`` on the final directory.
When building a wheel, pip can then just inside of this directory, call into the wheel metadata/wheel build API and ultimately install that wheel. It may even make sense for the build_wheel API to produce an “unpacked wheel” as well and again let something like ``twine wheel`` handle running ``shutil.make_archive()`` on it. Similar to the sdist case, this would further reduce the install time by avoiding the need to zip and then immediately unzip inside of pip.
One unrelated thing I just noticed, I don’t think the PEP states how pip is supposed to communicate to this API what it’s actually building. Is it just assumed to be ``os.curdir()``? Ideally I think we’d add another parameter to all of these functions that is the directory containing the “input” to each of these.
— Donald Stufft

On Tue, 30 May 2017 at 11:40 Donald Stufft donald@stufft.io wrote:
On May 30, 2017, at 2:17 PM, Brett Cannon brett@python.org wrote:
Just to make sure I'm following this correctly, Donald is asking for:
- A way for pip to ask back-ends what files should be in an sdist from
a source checkout or to make an actual sdist - Because sdists are a thing and so we should support them properly - To make it so building wheels interact with just the files from an sdist instead of skipping that and going straight from a source checkout
- Have this be a part of PEP 517 or at least a requirement for
back-ends to support so it doesn't get left out
Am I missing anything?
More or less. If I had my druthers we’d just add another mandatory method to the API, something like:
def build_sdist_tree(sdist_directory, config_settings) -> None: …
All this does is assemble the would be sdist tree into sdist_directory, handling any sdist creation time steps WITHOUT actually tar+gz’ing up this tree. My tests show that this is basically as fast as the copytree option in the simple case (which makes sense, that’s basically all it is) and is way faster than actually assembling the entire sdist tarball and all.
So the back-ends then need to provide a way for users to specify what files to go into the sdist (e.g. the MANIFEST.in case). Obviously a fallback is to just copy everything or just what is used to build the wheel (overly board or overly selective, respectively), but I doubt tools will want to do either in general and so they will need to come up with some solution to specify extra files like READMEs and stuff.
To handle creation, we can either have ``twine sdist`` or something like that which just calls the above API and then runs ``shutil.make_archive()`` on the final directory.
Do we want to say that sdist creation is the front-end's job? If so that is different from PEP 517 where the back-end packages zip up the wheel versus simply providing all the data and files that make up a wheel as seems to be suggested in the sdist case here.
When building a wheel, pip can then just inside of this directory, call into the wheel metadata/wheel build API and ultimately install that wheel. It may even make sense for the build_wheel API to produce an “unpacked wheel” as well and again let something like ``twine wheel`` handle running ``shutil.make_archive()`` on it. Similar to the sdist case, this would further reduce the install time by avoiding the need to zip and then immediately unzip inside of pip.
If we're going to go down a route where back-ends simply provide the files and the front-ends do the packaging then it makes sense to do that for wheels as well in PEP 517, separating the back-ends to create the components of the final artifact while the front-ends handle the bundling up for all of it into its final form. This makes the back-ends purely a compiler-and-metadata thing and the front-ends a manage-final-files thing.
-Brett
One unrelated thing I just noticed, I don’t think the PEP states how pip is supposed to communicate to this API what it’s actually building. Is it just assumed to be ``os.curdir()``? Ideally I think we’d add another parameter to all of these functions that is the directory containing the “input” to each of these.
—
Donald Stufft

On 31 May 2017 at 04:17, Brett Cannon brett@python.org wrote:
Just to make sure I'm following this correctly, Donald is asking for:
A way for pip to ask back-ends what files should be in an sdist from a source checkout or to make an actual sdist
Because sdists are a thing and so we should support them properly To make it so building wheels interact with just the files from an sdist instead of skipping that and going straight from a source checkout
Have this be a part of PEP 517 or at least a requirement for back-ends to support so it doesn't get left out
Am I missing anything?
Yes: there are apparently additional design considerations here that Thomas and I *aren't aware of* because they relate to internal details of how pip works that we've never personally needed to worry about.
Rather than trying to wedge the explanation of those considerations and their consequences into the existing structure of PEP 517 (which I consider to be a clear and complete discussion of a particular problem and an appropriate solution), I'd prefer to see them laid out clearly in their own PEP that describes the status quo of how a pip driven build works in practice, how PEP 517 alters that status quo, and the necessary enhancements to the build backend API needed to deal with it effectively.
Cheers, Nick.

On 30 May 2017 at 17:17, Donald Stufft donald@stufft.io wrote:
This is a long standing issue with pip that people hit with semi regularity— refusing to fix it is user hostile. Personally I don’t really have much interest in seeing something land in pip that prevents fixing issues that we’re currently seeing— the other pip devs may disagree with me, but as it stands I would be -1 on implementing this PEP as it stands without additional work (either in a stand alone PEP, or as part of this PEP, though I prefer as part of this PEP).
Just to chime in as "another pip developer" I agree that this is something we need to solve, and I'm -1 on anything that makes doing so harder.
I assume that no-one is trying to insist that pip shouldn't do the build in a temporary directory? That's existing pip behaviour and writing a PEP that doesn't support it isn't going to get very far. So the question is, what does the PEP need to do? Not saying anything means that pip can't implement the PEP without abandoning any hope of improving copy times (at least until a follow-up PEP is agreed). So PEP 517 languishes unused until we resolve that issue and write that follow-up PEP. That seems pointless. Surely it's better to cover the issue now.
Maybe all we need to do is to make it the backend's job, and say "the backend MUST copy the source tree to an isolated temporary directory when building". But as Donald says, that means that tools to build sdists have to replicate that logic[1]. So surely it's better to factor out the "define the set of files needed for a build" into a build backend API that both tools can use?
Paul
[1] There's an implied assumption here that we need tools to build sdists - either backends do it themselves, or something else does. I think both Donald and I take that as a given, because if you don't have source bundles (call them that if you don't like the term sdist) on PyPI, then people whose systems don't match the supplied wheels are out of luck. Also, from a policy point of view, I'd be bothered by PyPI being used to distribute binary only packages - we're an open source community, after all.

On 30 May 2017 at 21:37, Donald Stufft donald@stufft.io wrote:
I don’t think there is any value in decoupling the generation of what goes into an sdist from the tool that builds them. If we did that, I suspect that 100% of the time the exact same tool is going to be used to handle both anyways (or people just won’t bother to put in the extra effort to produce sdists). I think trying to split them up serves only to make the entire toolchain harder and more complicated for people who aren’t stepped in packaging lore to figure out what goes where and what does what. The fact that we have different mechanisms just to control what goes into a sdist (MANIFEST.in) vs what gets installed (package_data) already confuses people, further splitting these two steps apart is only going to make that worse.
Keeping the two together also completely sidesteps the problems around “well what if only the sdist tool is defined but the build tool isn’t?” Or “what if only the build tool is defined but the sdist tool isn’t?”.
I don't have a strong opinion either way, so I'd also be fine if the second PEP defined a new optional method for build backends to implement rather than a new project.toml setting.
So I'd say go ahead and write a new PEP that depends on PEP 517, and defines the source tree export API you'd like build backends to provide.
PEP 517 is already fairly complex just in covering the build step, and trying to also explain the considerations involved in handling the sdist source export step would likely make it unreadable.
The only constraints I'd place on that proposal up front are:
- it needs to be consistent with the accepted PEP 517 interface - if the build backend doesn't provide the source tree export method, then the new PEP should require frontends to fall back to the current "copy everything" behaviour
Beyond that, since you know what you're looking for, and neither Thomas nor I fully understand that yet, it makes far more sense you to write it, and for us to review it as a separate PEP, rather than trying to incorporate both the additional proposal and its rationale into PEP 517.
Then one PR to pip can implement support for both PEPs, and everyone will be happy.
Cheers, Nick.

On May 31, 2017, at 4:17 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 30 May 2017 at 21:37, Donald Stufft donald@stufft.io wrote:
I don’t think there is any value in decoupling the generation of what goes into an sdist from the tool that builds them. If we did that, I suspect that 100% of the time the exact same tool is going to be used to handle both anyways (or people just won’t bother to put in the extra effort to produce sdists). I think trying to split them up serves only to make the entire toolchain harder and more complicated for people who aren’t stepped in packaging lore to figure out what goes where and what does what. The fact that we have different mechanisms just to control what goes into a sdist (MANIFEST.in) vs what gets installed (package_data) already confuses people, further splitting these two steps apart is only going to make that worse.
Keeping the two together also completely sidesteps the problems around “well what if only the sdist tool is defined but the build tool isn’t?” Or “what if only the build tool is defined but the sdist tool isn’t?”.
I don't have a strong opinion either way, so I'd also be fine if the second PEP defined a new optional method for build backends to implement rather than a new project.toml setting.
So I'd say go ahead and write a new PEP that depends on PEP 517, and defines the source tree export API you'd like build backends to provide.
PEP 517 is already fairly complex just in covering the build step, and trying to also explain the considerations involved in handling the sdist source export step would likely make it unreadable.
The only constraints I'd place on that proposal up front are:
- it needs to be consistent with the accepted PEP 517 interface
- if the build backend doesn't provide the source tree export method,
then the new PEP should require frontends to fall back to the current "copy everything" behaviour
Beyond that, since you know what you're looking for, and neither Thomas nor I fully understand that yet, it makes far more sense you to write it, and for us to review it as a separate PEP, rather than trying to incorporate both the additional proposal and its rationale into PEP 517.
Then one PR to pip can implement support for both PEPs, and everyone will be happy.
I don’t think an *optional* interface is the correct way to implement this. I’m not entirely sure why folks seem to be ignoring the fact that this is not just for pip, but well, this isn’t just for pip. There are a wide range of popular ecosystem projects depending on the ability to produce a sdist from a given VCS tree, and the current proposal leaves those projects out in the cold.
I also think that the current interface needs some tweaks, some to make it clearer what is being operated on, and some to reduce installation time (basically to avoid a round trip through zip needlessly).
Given that, instead of a separate PEP I’ve just implemented the changes I want to see as a PR which is quite short, it’s a +55 -22 diff and can be viewed at https://github.com/python/peps/pull/275 https://github.com/python/peps/pull/275 .
— Donald Stufft

On 31 May 2017 at 21:56, Donald Stufft donald@stufft.io wrote:
I don’t think an *optional* interface is the correct way to implement this. I’m not entirely sure why folks seem to be ignoring the fact that this is not just for pip, but well, this isn’t just for pip. There are a wide range of popular ecosystem projects depending on the ability to produce a sdist from a given VCS tree, and the current proposal leaves those projects out in the cold.
I also think that the current interface needs some tweaks, some to make it clearer what is being operated on, and some to reduce installation time (basically to avoid a round trip through zip needlessly).
Given that, instead of a separate PEP I’ve just implemented the changes I want to see as a PR which is quite short, it’s a +55 -22 diff and can be viewed at https://github.com/python/peps/pull/275 .
Cool, thanks for doing that - I'll let you thrash out the details with Thomas as the PEP author, and then take a look once you're in agreement on the amendments you want to make.
Cheers, Nick.

On Wed, May 31, 2017, at 02:24 PM, Nick Coghlan wrote:
Cool, thanks for doing that - I'll let you thrash out the details with Thomas as the PEP author, and then take a look once you're in agreement on the amendments you want to make.
I've had a quick look over the PR, and my main thought is that if we're going to specify a 'build sdist' step, we need to specify more precisely what an sdist actually means. The PEP as currently written expects a 'source tree' as the input for building a wheel, but does not really need to specify sdists, beyond being a possible source of a source tree. If the backend needs to produce an sdist from a source tree, I think we need a clearer picture of what the expected result looks like.
Thomas

On 31 May 2017 at 14:39, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Wed, May 31, 2017, at 02:24 PM, Nick Coghlan wrote:
Cool, thanks for doing that - I'll let you thrash out the details with Thomas as the PEP author, and then take a look once you're in agreement on the amendments you want to make.
I've had a quick look over the PR, and my main thought is that if we're going to specify a 'build sdist' step, we need to specify more precisely what an sdist actually means. The PEP as currently written expects a 'source tree' as the input for building a wheel, but does not really need to specify sdists, beyond being a possible source of a source tree. If the backend needs to produce an sdist from a source tree, I think we need a clearer picture of what the expected result looks like.
My feeling is that in the current context we're talking about "minimal set of files needed to build the wheel". That's not precisely a sdist (because it ignores additional files that the user might want to include like README, COPYRIGHT, or tests) so doesn't satisfy the use case for something like "twine sdist" - but I don't see how we can do that without getting sucked into the full "sdist 2.0" debate, which really is off-topic for this PEP.
I'll take a proper look at the PR and add any further comments there.
Paul

On Wed, May 31, 2017, at 02:57 PM, Paul Moore wrote:
My feeling is that in the current context we're talking about "minimal set of files needed to build the wheel". That's not precisely a sdist (because it ignores additional files that the user might want to include like README, COPYRIGHT, or tests) so doesn't satisfy the use case for something like "twine sdist" - but I don't see how we can do that without getting sucked into the full "sdist 2.0" debate, which really is off-topic for this PEP.
I would be fine with specifying a hook to copy only the files needed to build the wheel, but if we do that, let's not call it 'sdist' or anything that suggests that.

On Wed, May 31, 2017, at 03:00 PM, Thomas Kluyver wrote:
I would be fine with specifying a hook to copy only the files needed to build the wheel, but if we do that, let's not call it 'sdist' or anything that suggests that.
Also, if this is all we're after, I'd like to push again for making it optional - for flit, the resulting wheel will always be the same whether you copy the files somewhere else first or just build from the original source tree. It's less to implement and more performant if flit can just build the wheel directly, and skip the copying step.
Thomas

On 31 May 2017 at 15:05, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Wed, May 31, 2017, at 03:00 PM, Thomas Kluyver wrote:
I would be fine with specifying a hook to copy only the files needed to build the wheel, but if we do that, let's not call it 'sdist' or anything that suggests that.
I agree - I've been trying to push away from using the term "sdist" myself, as there's too much history with it, most of which isn't (IMO) relevant in this context.
Also, if this is all we're after, I'd like to push again for making it optional - for flit, the resulting wheel will always be the same whether you copy the files somewhere else first or just build from the original source tree. It's less to implement and more performant if flit can just build the wheel directly, and skip the copying step.
Hmm. The proposed API (whether it's "create a sdist" or "tell me what files to copy") is intended so that a frontend can use the following process:
1. Create a temp directory 2. Call a hook to get the "source" 3. Put the source in the temp directory (unpack, copy, ask the hook to put the files in here directly, whatever) 4. Call the "build wheel" hook on the temp directory
Pip basically does this currently, and it's probably out of scope for this PEP to look at changing that. The current discussion is really only about whether we need a hook at step 2, or whether "copy the whole source tree" is sufficient. Clearly, in one sense, it is sufficient, because that's what pip has to do at the moment, but we have outstanding bug reports flagging that our copy step is sometimes severely inefficient, and without the proposed hook, we have no way of improving things. The benefit of a formal hook is that it gives the backend control of the process, but that only helps if the tools that cause us issues currently, like setuptools_scm, work with the backend, specifically to control what gets copied so that it's sufficient.
Backends like flit probably have no need for this flexibility, but should probably follow some form of standard rule like "exclude VCS and tox control directories". It may make sense to implement that rule in the frontend (on the usual "one frontend, multiple backends" basis) but I'm inclined to think that choice should be opt-in, not make it the default if a backend doesn't implement the hook. I'm prepared to be convinced otherwise, though.
By the way, in theory with this workflow, there's some pretty bizarre misbehaviours that backends could implement. Consider a backend that does "build sdist" by preprocessing the current directory, and writing a sdist that uses a completely different backend (e.g., a "setuptools_scm" backend that writes sdists that freeze the version data and use flit to build). Then step (4) needs to re-read the backend before running the "build wheel" hook. That's actually a not-unreasonable way to handle tools like setuptools_scm, so we should probably take a view on whether it's allowable. Specifically, there's a choice the spec needs to make between
1. Frontends MAY determine the backend once and assume that it won't change while processing a project 1a. Hooks MUST NOT modify the pyproject.toml metadata specifying the backend details, and MUST copy that file unchanged if asked to copy it
vs
2. Frontends MUST allow for the possibility that pyproject.toml could change after a hook is called
Ever get the feeling that you shouldn't have looked too closely at something? :-)
Paul

The hook is also so a tool like tox or TravisCI or twine can produce a sdist that can be uploaded to PyPI or similar.
Sent from my iPhone
On May 31, 2017, at 11:16 AM, Paul Moore p.f.moore@gmail.com wrote:
Hmm. The proposed API (whether it's "create a sdist" or "tell me what files to copy") is intended so that a frontend can use the following process:
- Create a temp directory
- Call a hook to get the "source"
- Put the source in the temp directory (unpack, copy, ask the hook to
put the files in here directly, whatever) 4. Call the "build wheel" hook on the temp directory

On Wed, May 31, 2017, at 04:31 PM, Donald Stufft wrote:
The hook is also so a tool like tox or TravisCI or twine can produce a sdist that can be uploaded to PyPI or similar.
This seems like a distinct operation from 'prepare the files needed to build a wheel', although they are related. For instance, sdists typically contain docs (in source form), but you don't need these to build a wheel. It would be fairly easy for Flit to identify the files needed to build a wheel, but it can only identify the files needed for an sdist with the help of a VCS.
Would you be happy with a compromise where PEP 517 defines a (possibly optional) hook like 'prepare_build_files', and we leave a standardised interface for preparing an sdist to be hashed out in a later PR?
Thomas

Sounds like we have bad support for out of 🌲 builds
On Wed, May 31, 2017, 11:53 Thomas Kluyver thomas@kluyver.me.uk wrote:
On Wed, May 31, 2017, at 04:31 PM, Donald Stufft wrote:
The hook is also so a tool like tox or TravisCI or twine can produce a sdist that can be uploaded to PyPI or similar.
This seems like a distinct operation from 'prepare the files needed to build a wheel', although they are related. For instance, sdists typically contain docs (in source form), but you don't need these to build a wheel. It would be fairly easy for Flit to identify the files needed to build a wheel, but it can only identify the files needed for an sdist with the help of a VCS.
Would you be happy with a compromise where PEP 517 defines a (possibly optional) hook like 'prepare_build_files', and we leave a standardised interface for preparing an sdist to be hashed out in a later PR?
Thomas _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 31 May 2017 at 16:31, Donald Stufft donald@stufft.io wrote:
The hook is also so a tool like tox or TravisCI or twine can produce a sdist that can be uploaded to PyPI or similar.
Understood. The part that the backend can do (these are the files needed for the build process) is done via the hook. The rest (add files requested by the user, generate metadata, ...) is backend independent (or covered by a separate hook/not specified yet in the case of metadata) and so should be done by the frontend tool.
Paul
PS When did Travis become a frontend? I'd have assumed that to produce a sdist, Travis would *invoke* a frontend tool, such as "twine sdist". Or am I unaware of some capability of Travis (highly likely)?

On May 31, 2017, at 12:33 PM, Paul Moore p.f.moore@gmail.com wrote:
On 31 May 2017 at 16:31, Donald Stufft donald@stufft.io wrote:
The hook is also so a tool like tox or TravisCI or twine can produce a sdist that can be uploaded to PyPI or similar.
Understood. The part that the backend can do (these are the files needed for the build process) is done via the hook. The rest (add files requested by the user, generate metadata, ...) is backend independent (or covered by a separate hook/not specified yet in the case of metadata) and so should be done by the frontend tool.
I don’t think it’s backend independent though. You’re going to have different mechanisms for handling these things in different backends, for example one piece of the metadata is the version. Some projects will be fine with a static version, some projects are going to want to automatically deduce it using a VCS. Trying to cram all of these into a single tool falls into the same problem that PEP 517 is trying to solve.
Paul
PS When did Travis become a frontend? I'd have assumed that to produce a sdist, Travis would *invoke* a frontend tool, such as "twine sdist". Or am I unaware of some capability of Travis (highly likely)?
No you’re correct, it currently just invokes ``setup.py sdist bdist_wheel``. The hook is needed so that Travis can have a singular tool to invoke (likely twine?) instead of needing to determine if it needs to invoke flit or setuptools or mytotallyradbuildthing. The thing I’m trying to express (and doing poorly it seems :( ) is that generating a sdist is an important thing to have be possible, and it needs to be done in a way that it can be invoked generically.
— Donald Stufft

On Wed, May 31, 2017, at 06:03 PM, Donald Stufft wrote:
generating a sdist is an important thing to have be possible, and it needs to be done in a way that it can be invoked generically.
I agree that it needs to be possible to make an sdist, but can we leave the generic interface for it to be standardised later? At the moment, I don't see an urgent need for a build-system independent 'make an sdist' command, and it's wrapped up in this whole question of what an sdist is and what you make it from (e.g. flit can make an sdist from a VCS, but not from another sdist). For pip, we seem to be using sdist as a proxy for 'get the files we need to build this thing'. That's much easier to specify and causes less controversy. Thomas

On 31 May 2017 at 18:03, Donald Stufft donald@stufft.io wrote:
No you’re correct, it currently just invokes ``setup.py sdist bdist_wheel``. The hook is needed so that Travis can have a singular tool to invoke (likely twine?) instead of needing to determine if it needs to invoke flit or setuptools or mytotallyradbuildthing. The thing I’m trying to express (and doing poorly it seems :( ) is that generating a sdist is an important thing to have be possible, and it needs to be done in a way that it can be invoked generically.
I don't think that's either unclear or in dispute. The question here is whether "produce a sdist" is in scope for this particular PEP.
The problem is that you're proposing using a "build a sdist" hook as the means for pip to do its "copy the source to an isolated build directory" step. Currently, doing that fails because tools like setuptools_scm work differently when run from a VCS checkout instead of a sdist. The long term plan for pip is something we've always described in terms of going via a sdist, but there's lots of details we haven't thrashed out yet. I don't think we're at a point where we can insist that in a post-PEP 517 world, we can switch straight to building via a sdist. However, I agree with you that we want PEP 517 to assist us with moving in that direction, and we *definitely* don't want to get into a situation like we're in now, where a PEP 517 compliant backend can leave us with no option better than "copy the whole source tree".
That's why I'm focusing at the moment on asking that PEP 517 require backends to specify the minimal set of files needed to do the build. With that we can create isolated build directories efficiently, and those files are going to be an essential component of building a sdist. They aren't *sufficient* to build a sdist by themselves, but specifying in a PEP how we do build a sdist is a perfectly reasonable next step, and can be done independently of PEP 517 (and indeed of pip - there's no "pip sdist" command, so pip doesn't even have a stake in that debate).
One other point that I think we're glossing over here. The only reason that I know of why current setuptools builds can't be isolated via a "build sdist" step, is because of helpers like setuptools_scm. I don't know much about such tools, and I don't know where they'd fit in a landscape of PEP 517 backends. Would setuptools_scm be a backend itself, or would it be a plugin for other backends (would backends want to *have* a plugin system for things like this)? The only concrete backends I know of are flit and something-that-will-do-what-setuptools-does-involving-compilers. Neither of those need a "tell me what files constitute a build" hook - they'd be fine with pip (or a library like packaging) implementing a "copy everything except VCS directories, tox directories, etc" heuristic[1]. So the cases where we *need* a hook are those where an as-yet undefined "use VCS information to generate wheel metadata" backends (or backend plugin) is involved - and nobody knows what shape that will take, so we're guessing. My feeling is thatsuch a tool should be a backend that wraps another backend, and modifies the generate metadata hook. On that basis, it's perfectly capable of also modifying the "specify the build files" hook. But that's nothing more than a guess at this point.
Thoughts?
Paul
[1] They can do a better job than that, certainly, which is why the hook is a good thing, but it's not a showstopper. The key point is that they need *less* than the obvious heuristic would give them, not more.

I've made an alternative PR against PEP 517 which defines a prepare_build_files hook, defined in terms of copying the files needed to build a package rather than making an sdist.

On May 31, 2017, at 2:01 PM, Paul Moore p.f.moore@gmail.com wrote:
On 31 May 2017 at 18:03, Donald Stufft donald@stufft.io wrote:
No you’re correct, it currently just invokes ``setup.py sdist bdist_wheel``. The hook is needed so that Travis can have a singular tool to invoke (likely twine?) instead of needing to determine if it needs to invoke flit or setuptools or mytotallyradbuildthing. The thing I’m trying to express (and doing poorly it seems :( ) is that generating a sdist is an important thing to have be possible, and it needs to be done in a way that it can be invoked generically.
I don't think that's either unclear or in dispute. The question here is whether "produce a sdist" is in scope for this particular PEP.
The problem is that you're proposing using a "build a sdist" hook as the means for pip to do its "copy the source to an isolated build directory" step. Currently, doing that fails because tools like setuptools_scm work differently when run from a VCS checkout instead of a sdist. The long term plan for pip is something we've always described in terms of going via a sdist, but there's lots of details we haven't thrashed out yet. I don't think we're at a point where we can insist that in a post-PEP 517 world, we can switch straight to building via a sdist. However, I agree with you that we want PEP 517 to assist us with moving in that direction, and we *definitely* don't want to get into a situation like we're in now, where a PEP 517 compliant backend can leave us with no option better than "copy the whole source tree”.
I don’t think we can start telling projects if they using a PEP 517 they can delete their ``setup.py`` and live in the brave new world (assuming all of their tools have been updated to PEP 517) when doing so is removing a “standard” interface for producing a sdist. Either a replacement for setup.py should support the things we want to keep from setup.py and explicitly unsupport things we don’t want, or I don’t think that thing is actually a replacement for setup.py and I don’t think we should support it.
Taking pip completely off the table a second, let’s take a look at tox. Tox’s default mode of operation is to produce a sdist. Now let’s say I’m writing a project that I want to use PEP 517 and get rid of setup.py, except now tox is broken with no path forward because PEP 517 doesn’t define how to produce a sdist.
The same story is true for TravisCI’s PyPI deployment pipeline, as soon as any project starts depending on PEP 517, we completely break that feature for them without a path for them to fix it (besides writing a PEP of course).
The same is true for Gem Fury’s private PyPI repositories where you can ``git push fury`` and have them build a sdist automatically for you.
This same pattern is over and over and over again, projects depend on the ability to produce a sdist for any Python project. PEP 517 says that people can delete their setup.py but doesn’t provide the mechanism for producing a sdist, thus breaking parts of the ecosystem. Simply changing the PEP to say “ok you can’t delete your setup.py yet” isn’t acceptable yet either, because then you have two competing build systems both who think they should be in charge, which makes the entire process *more* confusing for the end users than just baking the concept of sdist generation into PEP 517.
Now, independently of that, pip needs a way to take an arbitrary directory that might contain a git clone with a bunch of extraneous files in it, or it might also just be a sdist that was already unpacked. For a variety of reasons we want to copy this directory into a temporary location, but doing a blind copy of everything can trigger a bad path where a simple ``pip install .`` can take a long time (up to minutes long have been reported in the wild) trying to copy the entire directory, including files that we don’t even need or want. We need some mechanism for copying these files over, and it just so happens that the exact same process needs to occur when computing what files going into a sdist, and since I believe that for completely unrelated reasons, computing a sdist *must* be a part of any attempt to replace setup.py, reusing that simplifies the process of creating a PEP 517 backend (since having to only implement build_sdist is simpler than having to implement build_sdist AND copy_files_for_build).
In addition to all of the above, we currently have like 7 different “paths” installation can go through on the process of going from a VCS checkout/developer copy to a installed distribution, we have:
1) VCS Checkout -> Installed 2) VCS Checkout -> Sdist -> Installed 3) VCS Checkout -> Wheel -> Installed 4) VCS Checkout -> Sdist -> Wheel -> Installed 5) VCS Checkout -> Editable Install 6) VCS Checkout -> Sdist -> Editable Install
Unless you’re careful to have your packaging done exactly correct, each of those 6 can end up having different (and often times surprising behavior) that regular end users who are new to packaging (or hell, even old hands) hit with some regularity. One of my long term goals is try and reduce the number of those paths down, which will make it more likely that people are not surprised by edge cases in how their own uses are calling ``pip install`` and will ultimately provide a more enjoyable experience using pip. We obviously cannot reduce the number of supported methods down to 1, but we can reduce them down to:
A) VCS Checkout -> Sdist -> Wheel -> Installed B) VCS Checkout -> Editable Install
Implementing build_sdist in PEP 517 and using that to handle copying the files from what could either be a VCS checkout OR an unpacked sdist, means that we eliminate (1) and (3) from the first list. Ensuring that we only ever install a PEP 517 style project by always using build_wheel after having used build_sdist then eliminates (2). We’re not dealing with editable installs here (and it kind of pains me we aren’t, but they’re a much bigger topic so I think it’s unavoidable) but preventing an editable install of an sdist would eliminate (6) from above, leaving us with just two paths (and the second path requiring an explicit flag to opt into, rather than being implicit by nature of what you’re passing into pip and what other libraries you have installed).
In addition to all of the above, any part of building a sdist that is more complicated than “copy some files”, these build backends are already going to have to support by nature of the fact we’re expecting them to generate wheel metadata. The wheel metadata has to include the version number, so if someone wants to dynamically compute the version number from git, a PEP 517 backend must already handle that or it simply won’t work.
Finally, we’re should/are assuming that these build projects are going to be capable of producing sdists. Thus they already have to implement 99% of build_sdist anyways, and the only additional effort on their part is just the glue code that wires up their internal mechanism for producing sdists to the API that allows a standard mechanism for calling those mechanisms. Hopefully it is not controversial that a build tool *must* be capable of producing a sdist, since otherwise we’re throwing away support for any non Windows/macOS/Linux platform. Implementing a custom “copy these files” is *more* effort than exposing the mechanism that they should already have.
So yes, one of the things I want to do with this hook is copy the source files to an isolated directory, but that’s not the *only* thing I want to do with that hook, and when I see single solution that can solve multiple problems, I always vastly prefer that over a single solution that only solves a single problem.
— Donald Stufft

In my experience tools think an archive is an sdist if it is named properly, contains PKG-INFO with a minimum number of fields
Metadata-Version: 1.1Name: bobVersion: 1.0
and setup.py. setuptools sdists also contain .egg-info but it is unnecessary.
On Wed, May 31, 2017 at 2:41 PM Donald Stufft donald@stufft.io wrote:
On May 31, 2017, at 2:01 PM, Paul Moore p.f.moore@gmail.com wrote:
On 31 May 2017 at 18:03, Donald Stufft donald@stufft.io wrote:
No you’re correct, it currently just invokes ``setup.py sdist bdist_wheel``. The hook is needed so that Travis can have a singular tool to invoke (likely twine?) instead of needing to determine if it needs to invoke flit or setuptools or mytotallyradbuildthing. The thing I’m trying to express (and doing poorly it seems :( ) is that generating a sdist is an important thing to have be possible, and it needs to be done in a way that it can be invoked generically.
I don't think that's either unclear or in dispute. The question here is whether "produce a sdist" is in scope for this particular PEP.
The problem is that you're proposing using a "build a sdist" hook as the means for pip to do its "copy the source to an isolated build directory" step. Currently, doing that fails because tools like setuptools_scm work differently when run from a VCS checkout instead of a sdist. The long term plan for pip is something we've always described in terms of going via a sdist, but there's lots of details we haven't thrashed out yet. I don't think we're at a point where we can insist that in a post-PEP 517 world, we can switch straight to building via a sdist. However, I agree with you that we want PEP 517 to assist us with moving in that direction, and we *definitely* don't want to get into a situation like we're in now, where a PEP 517 compliant backend can leave us with no option better than "copy the whole source tree”.
I don’t think we can start telling projects if they using a PEP 517 they can delete their ``setup.py`` and live in the brave new world (assuming all of their tools have been updated to PEP 517) when doing so is removing a “standard” interface for producing a sdist. Either a replacement for setup.py should support the things we want to keep from setup.py and explicitly unsupport things we don’t want, or I don’t think that thing is actually a replacement for setup.py and I don’t think we should support it.
Taking pip completely off the table a second, let’s take a look at tox. Tox’s default mode of operation is to produce a sdist. Now let’s say I’m writing a project that I want to use PEP 517 and get rid of setup.py, except now tox is broken with no path forward because PEP 517 doesn’t define how to produce a sdist.
The same story is true for TravisCI’s PyPI deployment pipeline, as soon as any project starts depending on PEP 517, we completely break that feature for them without a path for them to fix it (besides writing a PEP of course).
The same is true for Gem Fury’s private PyPI repositories where you can ``git push fury`` and have them build a sdist automatically for you.
This same pattern is over and over and over again, projects depend on the ability to produce a sdist for any Python project. PEP 517 says that people can delete their setup.py but doesn’t provide the mechanism for producing a sdist, thus breaking parts of the ecosystem. Simply changing the PEP to say “ok you can’t delete your setup.py yet” isn’t acceptable yet either, because then you have two competing build systems both who think they should be in charge, which makes the entire process *more* confusing for the end users than just baking the concept of sdist generation into PEP 517.
Now, independently of that, pip needs a way to take an arbitrary directory that might contain a git clone with a bunch of extraneous files in it, or it might also just be a sdist that was already unpacked. For a variety of reasons we want to copy this directory into a temporary location, but doing a blind copy of everything can trigger a bad path where a simple ``pip install .`` can take a long time (up to minutes long have been reported in the wild) trying to copy the entire directory, including files that we don’t even need or want. We need some mechanism for copying these files over, and it just so happens that the exact same process needs to occur when computing what files going into a sdist, and since I believe that for completely unrelated reasons, computing a sdist *must* be a part of any attempt to replace setup.py, reusing that simplifies the process of creating a PEP 517 backend (since having to only implement build_sdist is simpler than having to implement build_sdist AND copy_files_for_build).
In addition to all of the above, we currently have like 7 different “paths” installation can go through on the process of going from a VCS checkout/developer copy to a installed distribution, we have:
- VCS Checkout -> Installed
- VCS Checkout -> Sdist -> Installed
- VCS Checkout -> Wheel -> Installed
- VCS Checkout -> Sdist -> Wheel -> Installed
- VCS Checkout -> Editable Install
- VCS Checkout -> Sdist -> Editable Install
Unless you’re careful to have your packaging done exactly correct, each of those 6 can end up having different (and often times surprising behavior) that regular end users who are new to packaging (or hell, even old hands) hit with some regularity. One of my long term goals is try and reduce the number of those paths down, which will make it more likely that people are not surprised by edge cases in how their own uses are calling ``pip install`` and will ultimately provide a more enjoyable experience using pip. We obviously cannot reduce the number of supported methods down to 1, but we can reduce them down to:
A) VCS Checkout -> Sdist -> Wheel -> Installed B) VCS Checkout -> Editable Install
Implementing build_sdist in PEP 517 and using that to handle copying the files from what could either be a VCS checkout OR an unpacked sdist, means that we eliminate (1) and (3) from the first list. Ensuring that we only ever install a PEP 517 style project by always using build_wheel after having used build_sdist then eliminates (2). We’re not dealing with editable installs here (and it kind of pains me we aren’t, but they’re a much bigger topic so I think it’s unavoidable) but preventing an editable install of an sdist would eliminate (6) from above, leaving us with just two paths (and the second path requiring an explicit flag to opt into, rather than being implicit by nature of what you’re passing into pip and what other libraries you have installed).
In addition to all of the above, any part of building a sdist that is more complicated than “copy some files”, these build backends are already going to have to support by nature of the fact we’re expecting them to generate wheel metadata. The wheel metadata has to include the version number, so if someone wants to dynamically compute the version number from git, a PEP 517 backend must already handle that or it simply won’t work.
Finally, we’re should/are assuming that these build projects are going to be capable of producing sdists. Thus they already have to implement 99% of build_sdist anyways, and the only additional effort on their part is just the glue code that wires up their internal mechanism for producing sdists to the API that allows a standard mechanism for calling those mechanisms. Hopefully it is not controversial that a build tool *must* be capable of producing a sdist, since otherwise we’re throwing away support for any non Windows/macOS/Linux platform. Implementing a custom “copy these files” is *more* effort than exposing the mechanism that they should already have.
So yes, one of the things I want to do with this hook is copy the source files to an isolated directory, but that’s not the *only* thing I want to do with that hook, and when I see single solution that can solve multiple problems, I always vastly prefer that over a single solution that only solves a single problem.
—
Donald Stufft _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Wed, May 31, 2017, at 07:40 PM, Donald Stufft wrote:
Taking pip completely off the table a second, let’s take a look at tox. Tox’s default mode of operation is to produce a sdist. Now let’s say I’m writing a project that I want to use PEP 517 and get rid of setup.py, except now tox is broken with no path forward because PEP 517 doesn’t define how to produce a sdist.> The same story is true for TravisCI’s PyPI deployment pipeline, as soon as any project starts depending on PEP 517, we completely break that feature for them without a path for them to fix it (besides writing a PEP of course).> The same is true for Gem Fury’s private PyPI repositories where you can ``git push fury`` and have them build a sdist automatically for you.
These tools are all things that the developers of the project choose to use, however. I don't use them, so I'm happy enough to get rid of setup.py and not have a standard interface to building sdists. Developers who do use them will want to stick with setup.py until there's a standard way to build an sdist - or a tool like tox may add support for going via wheels instead of via sdist. So PEP 517 may not be useful to *everyone* without standardising a way to build sdists, but it is still useful for many projects, and I don't think it prevents a later PEP from standardising a way to build sdists.

On May 31, 2017, at 3:12 PM, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Wed, May 31, 2017, at 07:40 PM, Donald Stufft wrote:
Taking pip completely off the table a second, let’s take a look at tox. Tox’s default mode of operation is to produce a sdist. Now let’s say I’m writing a project that I want to use PEP 517 and get rid of setup.py, except now tox is broken with no path forward because PEP 517 doesn’t define how to produce a sdist.
The same story is true for TravisCI’s PyPI deployment pipeline, as soon as any project starts depending on PEP 517, we completely break that feature for them without a path for them to fix it (besides writing a PEP of course).
The same is true for Gem Fury’s private PyPI repositories where you can ``git push fury`` and have them build a sdist automatically for you.
These tools are all things that the developers of the project choose to use, however. I don't use them, so I'm happy enough to get rid of setup.py and not have a standard interface to building sdists. Developers who do use them will want to stick with setup.py until there's a standard way to build an sdist - or a tool like tox may add support for going via wheels instead of via sdist.
So PEP 517 may not be useful to *everyone* without standardising a way to build sdists, but it is still useful for many projects, and I don't think it prevents a later PEP from standardising a way to build sdists.
The most likely outcome if PEP 517 is implemented as defined and people who aren’t steeped in packaging lore hear about is, is they get excited about being able to kill setup.py, they implement it, they find out some tool they depend on doesn’t work and can’t work with it, they get discouraged and start filling issues. Ideally those issues are filed on the tool that implemented PEP 517, but most likely it will be filed on tox, Travis, GemFury, etc.
I am struggling to figure out where there is opposition to simply exposing something in a standard way, that you were already planning on implementing anyways.
— Donald Stufft

On 31 May 2017 at 20:20, Donald Stufft donald@stufft.io wrote:
The most likely outcome if PEP 517 is implemented as defined and people who aren’t steeped in packaging lore hear about is, is they get excited about being able to kill setup.py, they implement it, they find out some tool they depend on doesn’t work and can’t work with it, they get discouraged and start filling issues. Ideally those issues are filed on the tool that implemented PEP 517, but most likely it will be filed on tox, Travis, GemFury, etc.
I am struggling to figure out where there is opposition to simply exposing something in a standard way, that you were already planning on implementing anyways.
There's a lot of baggage associated with the term sdist.
As a suggestion - if backends supplied a prepare_build_files hook, someone could write a pretty trivial tool that called that hook. Then call the build_wheel_metadata hook to get some details to put into PKG-INFO, zip the result up and call it a sdist. You could dump a setup.py replacement in there that used PEP 517 hooks to implement the setup.py interface, if you wanted.
Given how vaguely defined a sdist is, it would be hard to argue that the result is *not* a sdist. I'm not sure how much further you're going to insist on going. You no longer create a sdist using "setup.py sdist", sure. But at some point the tools have to deal with setup.py going away, so I don't see how that's a requirement forever.
If you really think we need to cover these use cases solidly (and you have a point, I'm not saying they are irrelevant at all) then maybe we need to get input from the tox/travis/gemfury devs, to see what they actually think, rather than trying to guess their requirements?
Paul
PS None of this means I am in any way in favour of making it seem like we're OK with projects not providing sdists (in some form or other). We're an open source community, and I think publishing sources alongside your binaries, is key to that. A link to an external source code repository isn't sufficient, IMO (the repo could die, like Google code did).

On May 31, 2017, at 3:38 PM, Paul Moore p.f.moore@gmail.com wrote:
On 31 May 2017 at 20:20, Donald Stufft donald@stufft.io wrote:
The most likely outcome if PEP 517 is implemented as defined and people who aren’t steeped in packaging lore hear about is, is they get excited about being able to kill setup.py, they implement it, they find out some tool they depend on doesn’t work and can’t work with it, they get discouraged and start filling issues. Ideally those issues are filed on the tool that implemented PEP 517, but most likely it will be filed on tox, Travis, GemFury, etc.
I am struggling to figure out where there is opposition to simply exposing something in a standard way, that you were already planning on implementing anyways.
There's a lot of baggage associated with the term sdist.
I mean, PEP 517 explicitly redefines “sdist” to mean “a tarball that includes a pyproject.toml, setup.py not required”, so by accepting PEP 517 we’re accepting that a sdist is either the thing that we conventionally call a sdist, or a thing that is similar to it, but instead of a setup.py it contains a pyproject.toml. That’s from https://www.python.org/dev/peps/pep-0517/#source-distributions https://www.python.org/dev/peps/pep-0517/#source-distributions which states:
For now, we continue with the legacy sdist format which is mostly undefined, but basically comes down to: a file named ``{NAME}-{VERSION}.{EXT}``, which unpacks into a buildable source tree called ``{NAME}-{VERSION}/``. Traditionally these have always contained ``setup.py``-style source trees; we now allow them to also contain ``pyproject.toml``-style source trees.
Integration frontends require that an sdist named ``{NAME}-{VERSION}.{EXT}`` will generate a wheel named ``{NAME}-{VERSION}-{COMPAT-INFO}.whl``.
If we want to more rigorously define a sdist that’s fine, we can go down that rabbit hole, if we want to remove that and say for something to be a sdist they still have to have a setup.py that supports the expected commands and oh by the way you can use this new thing in PEP 517 to declare statically an alternative build tool that some setup_requires tool will use to replace bdist_wheel, then that’s fine too. Hell, If it wants to get rid of the sdist terminology completely and make a new format called a source wheel or a bagofiles or whatever, that’s fine too.
The PEP right now seems to want it both ways, it wants to declare this thing without the conventional interfaces is a sdist, while ignoring the fact that people are using those conventional interfaces. For me, if the PEP wants it’s new thing to be a sdist, then it needs to handle that case and anything else doesn’t sit right with me.
As a suggestion - if backends supplied a prepare_build_files hook, someone could write a pretty trivial tool that called that hook. Then call the build_wheel_metadata hook to get some details to put into PKG-INFO, zip the result up and call it a sdist. You could dump a setup.py replacement in there that used PEP 517 hooks to implement the setup.py interface, if you wanted.
Given how vaguely defined a sdist is, it would be hard to argue that the result is *not* a sdist. I'm not sure how much further you're going to insist on going. You no longer create a sdist using "setup.py sdist", sure. But at some point the tools have to deal with setup.py going away, so I don't see how that's a requirement forever.
That isn’t really the same though, prepare_build_files hook is presumably not going to be including things like the LICENSE file, documentation which are things you’d want in an sdist, but which would be non-obvious to include in the said prepare_build_files hook and is likely going to be vastly different than a sdist produced by ``myhypotheticalbuildtool sdist``, unless prepare_build_files hook is basically exactly the same as build_sdist in all but name.
I’m not stating that we need to support ``setup.py sdist`` forever, I’m saying we need to support a generic way to build a sdist to replace ``setup.py sdist``.
If you really think we need to cover these use cases solidly (and you have a point, I'm not saying they are irrelevant at all) then maybe we need to get input from the tox/travis/gemfury devs, to see what they actually think, rather than trying to guess their requirements?
I do think we need to cover them solidly yes.
I’m happy to try and bring them in, but as far as Travis/GemFury goes, I think their use case is pretty simple, given a python project using the hypothetical PEP 517, they want to produce a sdist and either serve it directly (GemFury) or publish it to PyPI (Travis). If we don’t add build_sdist, and every project implements their own mechanism for generating a sdist, then either they, or some wrapper tool has to know and understand every possible build tool they might use and how to get it to build a sdist (OR they have to start adding configruration for people to instruct them how to build a sdist for their specific project using whatever tool they’re using).
As far as Tox goes, I will poke them, but I’m pretty sure the answer is wanting to test the thing that is being installed as it would be installed from PyPI, to try and mitigate packaging errors where you forget a file in your MANIFEST.in or so.
Paul
PS None of this means I am in any way in favour of making it seem like we're OK with projects not providing sdists (in some form or other). We're an open source community, and I think publishing sources alongside your binaries, is key to that. A link to an external source code repository isn't sufficient, IMO (the repo could die, like Google code did).
Unfortunately, that is exactly how I think PEP 517 will end up, since it only requires building wheels, people are going to implement ones that only build wheels, and we have nothing to indicate to them that they generally shouldn’t do that (without a specific reason). On the contrary, the indications point to sdist not being important enough to even be given an option to build them at all, requiring tool authors to go out of their way to decide to add it.
— Donald Stufft

On Wed, May 31, 2017, at 08:20 PM, Donald Stufft wrote:
I am struggling to figure out where there is opposition to simply exposing something in a standard way, that you were already planning on implementing anyways.
I have issues with it because: 1. Building a *release-quality* sdist is a complicated topic in its own right, and I'd like to move forwards with what we've already defined for building wheels without getting mired in that debate.2. I think it's a mistake to conflate "get the files we need to build this project" with "make a source distribution", and I don't want one hook to be used for both operations. Flit can do the former very easily, and in situations where it cannot do the latter. a) If pip uses the hook for the former purpose, and I implement it with that in mind, it will give poor results if other tools use it to release an sdist. b) If the hook properly makes an sdist, it will fail in situations where it can do what pip needs, and it will be unnecessarily slow even where it succeeds. Thomas

On May 31, 2017, at 3:45 PM, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Wed, May 31, 2017, at 08:20 PM, Donald Stufft wrote:
I am struggling to figure out where there is opposition to simply exposing something in a standard way, that you were already planning on implementing anyways.
I have issues with it because:
- Building a *release-quality* sdist is a complicated topic in its own right, and I'd like to move forwards with what we've already defined for building wheels without getting mired in that debate.
How you build the release-quality sdist isn’t really of concern of PEP 517 any more than building a release quality wheel is, it’s up to the build tool to implement that as it makes sense for them.
- I think it's a mistake to conflate "get the files we need to build this project" with "make a source distribution", and I don't want one hook to be used for both operations. Flit can do the former very easily, and in situations where it cannot do the latter.
a) If pip uses the hook for the former purpose, and I implement it with that in mind, it will give poor results if other tools use it to release an sdist. b) If the hook properly makes an sdist, it will fail in situations where it can do what pip needs, and it will be unnecessarily slow even where it succeeds.
I could see this as an argument that the PEP should have *both* a build_sdist and a prepare_build_files hook, if you don’t think that the build_sdist hook is suitable on it’s own. I’m not sure how I feel about that off the top of my head, but I *think* I would be okay adding the mandatory build_sdist command and an optional prepare_build_files hook, with the semantics being that if prepare_build_files is NOT defined, then it is acceptable for a tool like pip to use build_sdist for this purpose and if prepare_build_files IS defined, then the resulting wheel from build_wheel should not meaningfully differ from what would have been produced from build_sdist + build_wheel.
— Donald Stufft

On Wed, May 31, 2017, at 09:16 PM, Donald Stufft wrote:
How you build the release-quality sdist isn’t really of concern of PEP 517 any more than building a release quality wheel is, it’s up to the build tool to implement that as it makes sense for them.
But if we have a hook for building something called an sdist, we need to define what an sdist is. The definition you're referring to is a functional definition sufficient for a tool that only wants to install it, but it doesn't cover what an sdist means or how it fits into workflows.
I could see this as an argument that the PEP should have *both* a build_sdist and a prepare_build_files hook, if you don’t think that the build_sdist hook is suitable on it’s own.
I would prefer that to the the current status of one hook that tries to cover both use cases. I'd still rather specify prepare_build_files in this PEP, and leave the sdist hook to a later PEP. I don't see a great need for a standard interface to building an sdist: the developer doing that by calling their build tool directly seems adequate for the majority of use cases. Thomas

On 31 May 2017 at 22:13, Thomas Kluyver thomas@kluyver.me.uk wrote:
But if we have a hook for building something called an sdist, we need to define what an sdist is.
OK, so can we do that?
At the moment, we have a de facto definition of a sdist - it's something with a setup.py, some metadata defined by the metadata PEPs (but implemented prettly loosely, so people don't actually rely on it) and the files needed to build the project. Plus other stuff like LICENSE that's important, but defined by the project saying it should be included. Consumers of sdists are allowed to unpack them and run any of the setup.py commands on them. They can in theory inspect the metadata, but in practice don't. Call that the legacy sdist format.
What do consumers of the sdist format want to do? I don't actually know, but my guess is that they just want to be able to install the sdist. We presumably don't want to preserve the setup.py interface so they need to allow for a new interface. What's wrong with "pip install <file>"? They also want to publish the sdist to PyPI - so they need to name it according to the current convention. Anything else?
Call this the post-PEP 517 setup.py. It's still not fully standardised, it's an underspecified de facto standard, but "something that follows current naming conventions and can be installed via pip install filename" could be something that will do for now, until we want to fully standardise sdist 2.0, with static metadata and all that stuff. And as an additional benefit, all legacy sdists already conform to this "spec".
I 100% agree that the current vagueness around what a sdist is, and what tools can expect to do with them, is horribly unsatisfactory. But to make any progress we have to discard the "exposes a setup.py interface" rule. That's all we need for now. Longer term, we need a formal spec. But *for now*, can we manage by replacing the setup.py interface with an "installable by pip" interface? Or does anyone have an alternative "good enough for now" definition of a sdist we can agree on?
If we can do this, we can move forward. Otherwise, I fear this discussion is going to stall with another "try to solve all the problems at once" deadlock.
Paul

On Wed, May 31, 2017 at 6:48 PM, Paul Moore p.f.moore@gmail.com wrote:
What do consumers of the sdist format want to do? I don't actually know,
...
They also want to publish the sdist to PyPI - so they need to name it according to the current convention.
I think we can rule this out for consumers of the sdist; only the provider cares about this. If you started from an sdist, you don't need to publish it to PyPI (though you may want to "publish" to some repository within your own organization).
-Fred

On May 31, 2017, at 6:48 PM, Paul Moore p.f.moore@gmail.com wrote:
On 31 May 2017 at 22:13, Thomas Kluyver thomas@kluyver.me.uk wrote:
But if we have a hook for building something called an sdist, we need to define what an sdist is.
OK, so can we do that?
At the moment, we have a de facto definition of a sdist - it's something with a setup.py, some metadata defined by the metadata PEPs (but implemented prettly loosely, so people don't actually rely on it) and the files needed to build the project. Plus other stuff like LICENSE that's important, but defined by the project saying it should be included. Consumers of sdists are allowed to unpack them and run any of the setup.py commands on them. They can in theory inspect the metadata, but in practice don't. Call that the legacy sdist format.
What do consumers of the sdist format want to do? I don't actually know, but my guess is that they just want to be able to install the sdist. We presumably don't want to preserve the setup.py interface so they need to allow for a new interface. What's wrong with "pip install <file>"? They also want to publish the sdist to PyPI - so they need to name it according to the current convention. Anything else?
Call this the post-PEP 517 setup.py. It's still not fully standardised, it's an underspecified de facto standard, but "something that follows current naming conventions and can be installed via pip install filename" could be something that will do for now, until we want to fully standardise sdist 2.0, with static metadata and all that stuff. And as an additional benefit, all legacy sdists already conform to this "spec".
I 100% agree that the current vagueness around what a sdist is, and what tools can expect to do with them, is horribly unsatisfactory. But to make any progress we have to discard the "exposes a setup.py interface" rule. That's all we need for now. Longer term, we need a formal spec. But *for now*, can we manage by replacing the setup.py interface with an "installable by pip" interface? Or does anyone have an alternative "good enough for now" definition of a sdist we can agree on?
If we can do this, we can move forward. Otherwise, I fear this discussion is going to stall with another "try to solve all the problems at once" deadlock.
Paul
I think I’m -0 on spelling it out as “it’s whatever pip can install” rather than just codifying what the defacto rules for that is, something like:
—
A sdist is a .tar.gz or a .zip file with a directory structure like (along with whatever additional files the project needs in the sdist):
. └── {name}-{version} ├── PKG-INFO └── setup.py OR pyproject.toml
If a sdist contains a pyproject.toml file that contains a build-system.build-backend key, then it is a PEP 517 style sdist and MUST be processed using the API as defined in PEP 517. Otherwise it is a legacy distutils/setuptools style sdist and MUST be processed by calling setup.py. PEP 517 sdists MAY contain a setup.py for compatibility with tooling that does not yet understand PEP 517.
PKG-INFO should loosely be a PEP 345 style METADATA file and the errata located at https://packaging.python.org/specifications/#package-distribution-metadata https://packaging.python.org/specifications/#package-distribution-metadata.
A sdist MUST following the {name}-{version}.{ext} naming scheme, where {ext} MUST be either .tar.gz or .zip matching the respective container/compression format being used. Both {name} and {version} MUST have any - characters escaped to a _ to match the escaping done by Wheel. Thus a sdist for a project named foo-bar with version 1.0-2 which is using a .tar.gz container for the sdist would produce a file named foo_bar-1.0_2.tar.gz.
—
I think this should cover the case of actually making the project pip installable (assuming of course the setup.py or build backend doesn’t do something silly like always sys.exit(1) instead of produce the expected outcome) as well as allow twine to upload a sdist produced by this (since it reads the PKG-INFO file). The delta from the defacto standard today is basically swapping the setup.py for pyproject.toml and the escaping of the filename [1]. This should not require much more of the backends than producing a wheel does, since the PKG-INFO file is essentially just the METADATA file from within a wheel (although if people are dynamically generating dependencies or something they may want to omit them rather than give misleading information in PKG-INFO).
A future PEP can solve the problem of a sdist 2.0 that has a nicer interface than that or better metadata or whatever. This just represents a fairly minimal evolution of what currently exists today to support the changes needed for PEP 517.
[1] We don’t _need_ to do this, but currently you can’t tell if foo-1-1.tar.gz is (foo-1, 1) or (foo, 1-1) and moving to mandate escaped names can try to solve that problem going into the future using the heuristic of if there is > 1 dash character in the filename, then it was not escaped and we have to fall back to the somewhat error prone context sensitive parsing of the filename. Certainly we could say that it’s out of scope for PEP 517 and leave it at that, but since it’s such a minor change I felt it wouldn’t be a big deal to add it here.
— Donald Stufft

On 1 June 2017 at 01:08, Donald Stufft donald@stufft.io wrote:
A sdist is a .tar.gz or a .zip file with a directory structure like (along with whatever additional files the project needs in the sdist):
[...]
I'm confused. Isn't this basically what PEP 517 says already? You've added some details and clarification, but that could just as easily be done in a separate document/PEP. The details aren't needed for PEP 517 itself.
In practical terms, tools that want to leave the API details to someone else can call out to pip, as I suggested. And I suspect many tools probably will, simply because it's easier than dealing with the two APIs directly.
Paul

On Jun 1, 2017, at 3:44 AM, Paul Moore p.f.moore@gmail.com wrote:
On 1 June 2017 at 01:08, Donald Stufft donald@stufft.io wrote:
A sdist is a .tar.gz or a .zip file with a directory structure like (along with whatever additional files the project needs in the sdist):
[...]
I'm confused. Isn't this basically what PEP 517 says already? You've added some details and clarification, but that could just as easily be done in a separate document/PEP. The details aren't needed for PEP 517 itself.
Yes, it’s basically what PEP 517 says already just more specific and detailed. I don’t know what more people want from “defining what an sdist is”, because that’s basically all an sdist is. I’ve always been of the opinion that PEP 517 is already defining (and then modifying) what an sdist is and I don’t know what more people would want.
PEP 517 needs to do it because PEP 517 wants to change the definition of what a sdist is, and you can’t really change the definition without in fact defining the new thing. I mean we could make a new PEP that just defines sdist (minus the pyproject.toml part) then make PEP 517 extend that PEP and add the pyproject.toml… but that seems kind of silly to me? Splitting it out into it’s own PEP gains us nothing and to me, feels like extra process for process’s sake.
— Donald Stufft

On Thu, Jun 1, 2017 at 5:34 AM, Donald Stufft donald@stufft.io wrote:
On Jun 1, 2017, at 3:44 AM, Paul Moore p.f.moore@gmail.com wrote:
On 1 June 2017 at 01:08, Donald Stufft donald@stufft.io wrote:
A sdist is a .tar.gz or a .zip file with a directory structure like (along with whatever additional files the project needs in the sdist):
[...]
I'm confused. Isn't this basically what PEP 517 says already? You've added some details and clarification, but that could just as easily be done in a separate document/PEP. The details aren't needed for PEP 517 itself.
Yes, it’s basically what PEP 517 says already just more specific and detailed. I don’t know what more people want from “defining what an sdist is”, because that’s basically all an sdist is. I’ve always been of the opinion that PEP 517 is already defining (and then modifying) what an sdist is and I don’t know what more people would want.
PEP 517 needs to do it because PEP 517 wants to change the definition of what a sdist is, and you can’t really change the definition without in fact defining the new thing. I mean we could make a new PEP that just defines sdist (minus the pyproject.toml part) then make PEP 517 extend that PEP and add the pyproject.toml… but that seems kind of silly to me? Splitting it out into it’s own PEP gains us nothing and to me, feels like extra process for process’s sake.
PEP 518's pyproject.toml only specifies a single table, `build-system`, that matters. Can we just add a blurb to PEP 517 that says something to the effect of "If the following sub table exists, its location key can be used to pre-populate the metadata_directory of `get_wheel_metadata` automatically":
[build-system.metadata] directory = some_dist_info_directory/
(pulled from the spec in 517 about what get_wheel_metadata is supposed to do)
Then we could default that directory to something obvious, like the aforementioned ./DIST-INFO or ./.dist-info, or whatever, because isn't such a directory expected to contain enough information to create a wheel anyway? Like {package-name and {version} via METADATA? And typically included in sdists already? If it has a SOURCE-RECORD file [new], then pip and friends can use that t o know what files are needed for the build, and can use pyproject.toml (if it exists) for creating and/or updating it for later sdist generation. In the simple case, every normal file in a wheel is also in an sdist, verbatim, with no additional artifacts of any kind (pure python) and only additional metadata. The build doesn't care if things like LICENCE are in the tree. If there is no static SOURCE-RECORD, pip and friends fallback to a wholesale copy operation of the input source. The build backend's `get_wheel_metadata` (if defined) can update or backfill missing information within the METADATA file, and create the WHEEL file (or save that for `build_wheel`), if it finds the `metadata.directory` seeded from the static location referenced in pyproject.tom l is incomplete.
In the end, the build frontend logic would look something like:
(also seems like `get_wheel_metadata` should maybe return the final .dist-info directory it decided on, or just settle on DIST-INFO and enough of this name-version.dist-info nonsense already... should possibly be a required build api function with the understanding `build_wheel` might update it)
* Is build-system.metadata.directory defined? YES: copy to {metadata_directory}/DIST-INFO NO: mkdir {metadata_directory}/DIST-INFO
* Does {metadata_directory}/DIST-INFO/SOURCE-RECORD exist? YES: use that to isolate/prune/copy source tree for initial build, if desired, and also confirm hashes, if any NO: do nothing
(we have something that might look like an sdist, but possibly incomplete [eg. still no METADATA])
* Is build-backend.MODULE.get_build_requires defined? YES: make sure those things exist then NO: do nothing
* Is build-backend.MODULE.get_wheel_metadata defined? YES: call it like PEP 517 says, DIST-INFO is ready for updating NO: do nothing
(we have something that might look like an sdist, but possibly incomplete [eg. still no METADATA])
* Is build-backend.MODULE.build_wheel defined? YES: call it like PEP 517 says, replace RECORD with the final record from build? NO: do nothing
* Is {metadata_directory}/DIST-INFO/* valid and the resultant whl as well? YES: YAY! \o/ NO: BLOW UUUUUP
* Does {metadata_directory}/DIST-INFO/SOURCE-RECORD exist [must reference pyproject.toml! too]? YES: use that to prune files when creating a proper sdist AFTER the build NO: sdist is original source tree + {metadata_directory}/DIST-INFO - RECORD(?)
(we have enough information to produce an complete sdist that could be used to generate a valid wheel again)
Because the build itself can output additional source files, that may be desirable to include in an sdist later, I honestly don't think you can pass through a "proper" sdist before a wheel. I think you can 99% of the time do that, but some builds using Cython and friends could actually have a custom initial build that generates standard .h/.c/.py, and even outputs an alternative p yproject.toml that *no longer needs* a custom build backend. Or just straight deletes it from SOURCE-RECORD once the custom build is done, because some artifacts are enough to rebuild a wheel next time. It seems to me the only possibly correct order is:
1. VCS checkout 2. partial sdist, but still likely an sdist, no promises! 3. wheel 4. proper sdist from generated SOURCE-RECORD, or updated static SOURCE-RECORD, or just original source tree + DIST-INFO
I don't see a way to get a 100% valid sdist without first building the project and effectively asking the build backend (via its SOURCE-RECORD, if any) "Well golly, you did a build! What, *from both the source tree and build artifacts*, is important for wrapping up into a redistributable?"
Maybe I'm overlooking something yuge (I've tried to follow this discussion, and have sort of checked out of python lately, but I'm fairly well-versed in packing lore and code), but in general I think we really are making sdists way way... way scarier than need be. They're pretty much whatever the build tells you is important for redistribution, at the end, with as much static meta data as possible, to the point of possibly obviating their need for pyproject.toml in the first place... maybe this aspect is what is hanging everyone up? A redistibutable source does not need to be as flexible as the original VCS input. An sdist is pinned to a specific version of a project, whereas VCS represents all possible versions (albeit only one is checkout out), and sdists *are not* wheels! The same expectations need not apply. Two sdists of the same version might not be identical; one might request the custom build backed via pyproject.toml, and the other might have already done some of the steps for whatever reason. Authors must decide which is more appropriate for sharing.
This ended up longer than I meant, but hopefully it's not all noise.
Thanks,

On Jun 1, 2017, at 2:12 PM, C Anthony Risinger c@anthonyrisinger.com wrote:
Because the build itself can output additional source files, that may be desirable to include in an sdist later, I honestly don't think you can pass through a "proper" sdist before a wheel. I think you can 99% of the time do that, but some builds using Cython and friends could actually have a custom initial build that generates standard .h/.c/.py, and even outputs an alternative p yproject.toml that *no longer needs* a custom build backend. Or just straight deletes it from SOURCE-RECORD once the custom build is done, because some artifacts are enough to rebuild a wheel next time. It seems to me the only possibly correct order is:
- VCS checkout
- partial sdist, but still likely an sdist, no promises!
- wheel
- proper sdist from generated SOURCE-RECORD, or updated static SOURCE-RECORD, or just original source tree + DIST-INFO
I don't see a way to get a 100% valid sdist without first building the project and effectively asking the build backend (via its SOURCE-RECORD, if any) "Well golly, you did a build! What, *from both the source tree and build artifacts*, is important for wrapping up into a redistributable?"
Maybe I'm overlooking something yuge (I've tried to follow this discussion, and have sort of checked out of python lately, but I'm fairly well-versed in packing lore and code), but in general I think we really are making sdists way way... way scarier than need be. They're pretty much whatever the build tells you is important for redistribution, at the end, with as much static meta data as possible, to the point of possibly obviating their need for pyproject.toml in the first place... maybe this aspect is what is hanging everyone up? A redistibutable source does not need to be as flexible as the original VCS input. An sdist is pinned to a specific version of a project, whereas VCS represents all possible versions (albeit only one is checkout out), and sdists *are not* wheels! The same expectations need not apply. Two sdists of the same version might not be identical; one might request the custom build backed via pyproject.toml, and the other might have already done some of the steps for whatever reason. Authors must decide which is more appropriate for sharing.
Do any projects build a copy of the library and use that for influencing what gets copied into the sdist today? As far as I am aware there is not any that do that. I think the standard thing to do in Cython is to produce the .c files as part of the sdist process, which is perfectly fine to do. With the newer PEPs it doesn’t _need_ to do that, since you can depend on Cython in your build steps and just ship the pyx files (although you’re still free to compute the .c files AOT and include them in the sdist).
— Donald Stufft

On Thu, Jun 1, 2017 at 1:22 PM, Donald Stufft donald@stufft.io wrote:
On Jun 1, 2017, at 2:12 PM, C Anthony Risinger c@anthonyrisinger.com wrote:
Because the build itself can output additional source files, that may be desirable to include in an sdist later, I honestly don't think you can pass through a "proper" sdist before a wheel. I think you can 99% of the time do that, but some builds using Cython and friends could actually have a custom initial build that generates standard .h/.c/.py, and even outputs an alternative p yproject.toml that *no longer needs* a custom build backend. Or just straight deletes it from SOURCE-RECORD once the custom build is done, because some artifacts are enough to rebuild a wheel next time. It seems to me the only possibly correct order is:
- VCS checkout
- partial sdist, but still likely an sdist, no promises!
- wheel
- proper sdist from generated SOURCE-RECORD, or updated static
SOURCE-RECORD, or just original source tree + DIST-INFO
I don't see a way to get a 100% valid sdist without first building the project and effectively asking the build backend (via its SOURCE-RECORD, if any) "Well golly, you did a build! What, *from both the source tree and build artifacts*, is important for wrapping up into a redistributable?"
Maybe I'm overlooking something yuge (I've tried to follow this discussion, and have sort of checked out of python lately, but I'm fairly well-versed in packing lore and code), but in general I think we really are making sdists way way... way scarier than need be. They're pretty much whatever the build tells you is important for redistribution, at the end, with as much static meta data as possible, to the point of possibly obviating their need for pyproject.toml in the first place... maybe this aspect is what is hanging everyone up? A redistibutable source does not need to be as flexible as the original VCS input. An sdist is pinned to a specific version of a project, whereas VCS represents all possible versions (albeit only one is checkout out), and sdists *are not* wheels! The same expectations need not apply. Two sdists of the same version might not be identical; one might request the custom build backed via pyproject.toml, and the other might have already done some of the steps for whatever reason. Authors must decide which is more appropriate for sharing.
Do any projects build a copy of the library and use that for influencing what gets copied into the sdist today? As far as I am aware there is not any that do that. I think the standard thing to do in Cython is to produce the .c files as part of the sdist process, which is perfectly fine to do. With the newer PEPs it doesn’t _need_ to do that, since you can depend on Cython in your build steps and just ship the pyx files (although you’re still free to compute the .c files AOT and include them in the sdist).
I admit I don't know of any either. Nor did I know the standard expectation in Cython was to generate things in the sdist phase. I have not personally used it for a project and was only using it as an example of a process that produces potentially redistributable artifacts.
What you are saying though I think only emphasizes previous comments about needing to pin down, once and for all, what it means to "be an sdist" (which I agree with), right now, and decide who is responsible for constructing that. We don't need to go on a rampage pronouncing new formats or files requirements like LICENCE... just state the status-quo and accept it.
Maybe we can start with simply defining an sdist as "some tree with a DIST-INFO". I'll avoid the package-name.dist-info for now and the problems therein, unless there is a simple consensus there.
From that, there seems to only be a small gap in the build api hooks, and a
missing "[sdist-system]" phase (maybe that doesn't sound as nice as build-system) that I believe would be a small PEP or additional to 517. In all honesty, I think probably *both* the sdist-system and the build-system need to be consulted to fully populate DIST-INFO... this can be the `build_sdist` hook in both. In all honestly, as you have clarified, Cython is *not* a build-system! It's an sdist-system. The C compiler is the expected build-system. For many build-systems, `build_sdist` might even be a noop, but it might still want the opportunity to make adjustments before starting.
It seems reasonable to me that both systems simply begin from a static DIST-INFO, if any, then work together to populate and update it. Something like setuptools_scm, as a sdist-system, might only generate a barebones METADATA with name and version info, then the selected build-system comes in and fills the rest. Or something like the Cython sdist-system might generate more source files and not even touch DIST-INFO, then the selected build-system comes in and fills the rest. Neither have anything to do with "the build".
Something like Cython is effectively doing a partial build before producing a redistributable source tree, and if we skip that step and go straight to a build via the build-system, then the only real option for sdists at that point is to ask the same backend, post-build, for the important redistibutable parts, which may or may not reflect a stable/reproducible input to the same system.
If I select "Cython" as the build-system, but I need to use a different compiler for currently-unsupported-platform-X, I'm going to have a bad day.

On Wed, May 31, 2017 at 08:08:51PM -0400, Donald Stufft wrote:
I think this should cover the case of actually making the project pip installable (assuming of course the setup.py or build backend doesn’t do something silly like always sys.exit(1) instead of produce the expected outcome)
My personal favorite was PyGame doing raw_input() from its setup.py.
Marius Gedminas

On 2017-05-31 20:08:51 -0400 (-0400), Donald Stufft wrote: [...]
Both {name} and {version} MUST have any - characters escaped to a _ to match the escaping done by Wheel. Thus a sdist for a project named foo-bar with version 1.0-2 which is using a .tar.gz container for the sdist would produce a file named foo_bar-1.0_2.tar.gz.
[...]
While I agree with the reasoning, if this is going to end up enforced by common tooling in the near future a warning period would be appreciated as it implies a fairly significant behavior change which will require quite a lot of adjustments to bespoke automation (at least for some I maintain, and I'm pretty sure there are plenty more out there too).

On Wed, 31 May 2017 at 14:14 Thomas Kluyver thomas@kluyver.me.uk wrote:
On Wed, May 31, 2017, at 09:16 PM, Donald Stufft wrote:
How you build the release-quality sdist isn’t really of concern of PEP 517 any more than building a release quality wheel is, it’s up to the build tool to implement that as it makes sense for them.
But if we have a hook for building something called an sdist, we need to define what an sdist is. The definition you're referring to is a functional definition sufficient for a tool that only wants to install it, but it doesn't cover what an sdist means or how it fits into workflows.
I could see this as an argument that the PEP should have *both* a build_sdist and a prepare_build_files hook, if you don’t think that the build_sdist hook is suitable on it’s own.
I would prefer that to the the current status of one hook that tries to cover both use cases.
I'd still rather specify prepare_build_files in this PEP, and leave the sdist hook to a later PEP. I don't see a great need for a standard interface to building an sdist: the developer doing that by calling their build tool directly seems adequate for the majority of use cases.
So it sounds like the list_build_files() part of the API is still useful for isolated builds versus in-place builds, correct?
As for Thomas' idea that "calling a build tool directly seems adequate" is somewhat interesting and not something I thought about. Let's look at Donald's list of current ways to get from something to installation (which I know we want to scale back):
1) VCS Checkout -> Installed 2) VCS Checkout -> Sdist -> Installed 3) VCS Checkout -> Wheel -> Installed 4) VCS Checkout -> Sdist -> Wheel -> Installed 5) VCS Checkout -> Editable Install 6) VCS Checkout -> Sdist -> Editable Install
OK, so what Thomas is suggesting is this isn't all necessarily directly under pip's purview. So if you take that list and break out those steps into "back-end responsibility" and "front-end responsibility" then you end up with:
Back-end (e.g. flit): 1) VCS Checkout -> wheel (driven by pip) 2) VCS Checkout -> sdist 2) sdist -> wheel (driven by pip)
Front-end (e.g. pip): 1) wheel -> installed
You can then generate the non-editable install steps above by combining these roles. And I think the point Thomas is trying to make is if you look at back-end#2 you can simply leave pip out of it in a post-PEP 517 world if you view an sdist as a trimmed down VCS checkout that pip just needs to know how to unpack.
But thinking from a more fundamental perspective, why does pip even need to be involved with any part of PEP 517? If pip is meant to be viewed as a package manager then isn't its key focus on fetching and installing appropriate things?
Well, it's because pip may have to build a wheel from a post-517 sdist that it downloaded from PyPI when a project doesn't provide e.g. a Windows wheel. That's why pip needs to be PEP 517-aware at all. Otherwise pip could just work in a wheel-only world and not even care about building wheels to begin with.
But where does building sdists come into play? You need to be able to make them, and you need to be able to get them up to PyPI. But beyond pip needing to know how to unpack an sdist to make the wheel it wants to work with, pip doesn't actually *need* to produce an sdist.
But I think *twine* is the tool that needs a way to specify how to produce an sdist. If we want to view twine as the tool to upload artifacts to PyPI then we need twine to know how to produce sdists and wheels in a PEP 517 world, not pip.
Maybe I'm missing something fundamental here about pip and all of this is wrong, but from where I'm sitting it seems the key thing pip needs from PEP 517 is how to go from a bunch of source to a wheel so that it can install that wheel. Now pip needs to know how to *consume* an sdist that it gets from PyPI that has a pyproject.toml, but it technically doesn't need to know how to *produce* one if in the end it really just wants a wheel. Twine, OTOH, does need a way to produce an sdist as well as a wheel so it can upload those to PyPI.
And so I think in a very wordy way, I just said we need to stop saying "pip needs a standardized way to produce an sdist" and instead start saying "twine needs a way to produce an sdist". And that leads to the question about whether PEP 517 must cover sdist *production* for twine because we want to have that solved before we have the *consumption* side in pip in place. Or put another way, are we okay with pip consuming things that twine simply can't make from a pyproject.toml file (yet)? A "yes" means PEP 517 shouldn't be held up, while a "no" means we need a solution for twine.

On 1 June 2017 at 21:45, Brett Cannon brett@python.org wrote:
And so I think in a very wordy way, I just said we need to stop saying "pip needs a standardized way to produce an sdist" and instead start saying "twine needs a way to produce an sdist". And that leads to the question about whether PEP 517 must cover sdist production for twine because we want to have that solved before we have the consumption side in pip in place. Or put another way, are we okay with pip consuming things that twine simply can't make from a pyproject.toml file (yet)? A "yes" means PEP 517 shouldn't be held up, while a "no" means we need a solution for twine.
The question is more, are we okay with pip consuming things (pyproject.toml source trees) for which we don't define any means of uploading to a package index (which is basically what a sdist is - a way of publishing sources on PyPI). My answer to that is "no".
pip doesn't need to be able to provide a user interface that builds a sdist. It *could* provide one, as an enhancement, but it's not *necessary*. Whether the canonical "build a sdist" command should be pip or (say) twine is a side issue here (although the ability to *have* a canonical command, and not rely on each backend having its own incompatible way, *is* important IMO - but that's not the point here). However, there are a number of places where pip indirectly needs the ability to build a sdist (or do something closely equivalent).
pip needs a way to deal with "pip install foo". There are 2 scenarios here:
1. The index (PyPI) contains an appropriate wheel. Then pip just installs it (using the wheel library). PEP 517 isn't involved, no build, easy. 2. There is no appropriate wheel. There are 2 subcases here: 2a. There is no sdist. We're stuck. See below. 2b. There is a sdist (whatever that means). As you say, pip needs to be able to consume that sdist. PEP 517 covers this as it stands.
pip also needs a way to deal with "pip install <local directory>. In this case, pip (under its current model) copies that directory to a working area. In that area, it runs the build command to create a wheel, and proceeds from there. In principle, there's little change in a PEP 517 world. But again, see below.
There are other things here, but they are largely marginal, or similar to the above cases (the big remaining case is editable installs, but that's a whole other question).
Notes:
- The case 2a in the "pip install foo" example is of concern. At the moment, building sdists is trivial (python setup.py sdist) and there's effectively no barrier to publishing sdists. In a PEP 517 world, backends may or may not provide a "build a sdist" command (flit doesn't, and for a long time I believe didn't propose to do so - partially because what constituted a sdist at the time was inappropriate for them, but they were happy to be wheel-only). That means that users of that backend are basically unable to upload source, and the likelihood of scenario 2a goes *way* up. Making sure that PEP 517 mandates that backends at least *allow* tools like twine to build a sdist from a source tree significantly alleviates this risk. - Copying a local directory to a temporary area is demonstrably a serious performance issue for pip. We have a number of bugs raised over this. The problem is that we have to blindly copy *everything*, including arbitrarily large amounts of irrelevant data. We wanted to switch to "build a sdist from the local directory, proceed from the "we have a sdist" case. But some uses of setup.py break with that approach. One of the hopes for PEP 517 is that we avoid that problem, because we make it the backend's problem (by asking the backend to tell us how to know what to copy, in one form or another). If there's no support in the PEP for this, we end up having to accept that going forward we'll *still* have this problem (unless we add an extra mandatory backend hook in a new PEP, but backends can still claim support for only PEP 517 and not the new PEP). This is admittedly an enhancement to pip, not essential functionality, but the point of this whole process is to enable the packaging ecosystem (including pip!) to move forward, rather than being paralyzed by the constraints of the old setup.py system. So leaving the same constraints in the new system isn't helpful. - The local directory -> sdist -> wheel -> install model has its issues, but it's a much cleaner model for pip (if we can get to it without breaking use cases) - for all the same reasons that switching from direct installs to installs via wheels was a huge win for us. To implement that model internally, pip would need a means of building a sdist.
I hope this helps. If nothing else, it makes your comments look less wordy by comparison :-)
Paul

On Thu, Jun 1, 2017, at 10:49 PM, Paul Moore wrote:
pip also needs a way to deal with "pip install <local directory>. In this case, pip (under its current model) copies that directory to a working area. In that area, it runs the build command to create a wheel, and proceeds from there. In principle, there's little change in a PEP 517 world. But again, see below.
I still question whether the copying step is necessary for the frontend. Pip does it for setup.py builds (AIUI) because they might modify or create files in the working directory, and it wants to keep the source directory clean of that. Flit can create a wheel without modifying/creating any files in the working directory.
So, should PEP 517 specify that the backend building a wheel has to do it without modifying/creating any files in the working directory? If the backend can't be sure it will do that, it should copy whatever it needs to a temporary directory and build from there. Tools falling back to setup.py would copy as part of the fallback build step.
This seems to me neater than insisting that the backend copy all its files even if there's no need.
As I mentioned on the PR, though, I don't feel strongly about this issue. It's simple enough to copy all the necessary files to another directory if that's what the build API requires.
In a PEP 517 world, backends may or may not provide a "build a sdist" command (flit doesn't, and for a long time I believe didn't propose to do so -
Flit does as of a few days ago. But it's intended for developers releasing a package, and so it relies on the code being in a VCS repository and the appropriate VCS CLI being available. I don't want this to be required when pip builds a package from source to install.
After quite a lot of discussion, I concluded that I want downloading and unpacking an sdist to get me something very close to a fresh VCS checkout of the corresponding release tag. Historically, we've often put the results of things like code-generation into sdists, but I think that was primarily because there was no good way to publish built distributions, and so we hacked sdists into serving a bdist-like function. Now that wheels are widely supported, I'm inclined to discourage doing things like that with sdists.
Thomas

On 1 June 2017 at 23:14, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Thu, Jun 1, 2017, at 10:49 PM, Paul Moore wrote:
pip also needs a way to deal with "pip install <local directory>. In this case, pip (under its current model) copies that directory to a working area. In that area, it runs the build command to create a wheel, and proceeds from there. In principle, there's little change in a PEP 517 world. But again, see below.
I still question whether the copying step is necessary for the frontend. Pip does it for setup.py builds (AIUI) because they might modify or create files in the working directory, and it wants to keep the source directory clean of that. Flit can create a wheel without modifying/creating any files in the working directory.
That's a very fair comment, and I honestly don't know how critical the copy step is - in the sense that I know we do it to prevent certain classes of issue, but I don't know what they are, or how serious they are. Perhaps Donald does?
It's certainly true that setup.py based builds are particularly unpleasant for the obvious "running arbitrary code" reasons. But I'm not sure how happy I am simply saying "backends must ..." what? How would we word this precisely? It's not just about keeping the sources clean, it's also about not being affected by unexpected files in the source directory. Consider that a build using a compiler will have object files somewhere. Should a backend use existing object files in preference to sources? What about a backend based on a tool designed to do precisely that, like waf or make? What if the files came from a build with different compiler flags? Sure, it's user error or a backend bug, but it'll be reported to pip as "I tried to install foo and my program failed when I imported it". We get that sort of bug report routinely (users reporting bugs in build scripts as pip problems) and we'll never have a technical solution to all the ways they can occur, but preventative code like copying the build files to a clean location can minimise them. (As I say, I'm speculating about whether that's actually why we build in a temp location, but it's certainly the sort of thinking that goes into our design).
Paul

On Jun 1, 2017, at 6:28 PM, Paul Moore p.f.moore@gmail.com wrote:
On 1 June 2017 at 23:14, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Thu, Jun 1, 2017, at 10:49 PM, Paul Moore wrote:
pip also needs a way to deal with "pip install <local directory>. In this case, pip (under its current model) copies that directory to a working area. In that area, it runs the build command to create a wheel, and proceeds from there. In principle, there's little change in a PEP 517 world. But again, see below.
I still question whether the copying step is necessary for the frontend. Pip does it for setup.py builds (AIUI) because they might modify or create files in the working directory, and it wants to keep the source directory clean of that. Flit can create a wheel without modifying/creating any files in the working directory.
That's a very fair comment, and I honestly don't know how critical the copy step is - in the sense that I know we do it to prevent certain classes of issue, but I don't know what they are, or how serious they are. Perhaps Donald does?
It's certainly true that setup.py based builds are particularly unpleasant for the obvious "running arbitrary code" reasons. But I'm not sure how happy I am simply saying "backends must ..." what? How would we word this precisely? It's not just about keeping the sources clean, it's also about not being affected by unexpected files in the source directory. Consider that a build using a compiler will have object files somewhere. Should a backend use existing object files in preference to sources? What about a backend based on a tool designed to do precisely that, like waf or make? What if the files came from a build with different compiler flags? Sure, it's user error or a backend bug, but it'll be reported to pip as "I tried to install foo and my program failed when I imported it". We get that sort of bug report routinely (users reporting bugs in build scripts as pip problems) and we'll never have a technical solution to all the ways they can occur, but preventative code like copying the build files to a clean location can minimise them. (As I say, I'm speculating about whether that's actually why we build in a temp location, but it's certainly the sort of thinking that goes into our design).
Paul
I suspect the original reasoning behind copying to a temporary location has been lost to the sands of time. We’ve been doing that in pip for as long as I’ve worked on pip, maybe Jannis or someone remembers why I dunno!
From my end, copying the entire directory alleviates a few problems:
* In the current environment, it prevents random debris from cluttering up and being written to the current directory, including build files. * This is important, because not only is it unhygienic to allow random bits of crap to crap all over the local directory, but in the current system the build directories are not sufficiently platform dependent (e.g. a Linux build only gets identified as a Linux build, even if it links against two different ABIs because it was mounted inside of a Debian and a CentOS Docker container).
* It reduces errors caused by people/tooling editing files while a build is being processed. This can’t ever be fully removed, but by copying to a temporary location we narrow the window down considerably where someone can inadvertently muck up their build mid progress.
* It prevents some issues with two builds running at the same time.
Narrowing that down to producing a sdist (or some other mechanism for doing a “copy what you would need” hook) in addition prevents:
* Unexpected files changing the behavior of the build.
* Misconfigured build tools appearing to “work” in development but failing when the sdist is released to PyPI or having the sdist and wheels be different because the wheel was produced from a VCS checkout but a build from a sdist wasn’t.
Ultimately you’re right, we could just encode this into PEP 517 and say that projects need to *either* give us a way to copy the files they need OR they need hygienic builds that do not modify the current directory at all. I greatly prefer *not* to do that though, because everyone is only human, and there is likely to be build backends that don’t do that— either purposely or accidentally— and it’ll likely be pip that fields those support issues (because they’ll see it as they invoked pip, so it must be pip’s fault).
In my mind the cost of *requiring* some mechanism of doing this is pretty low, the project obviously needs to know what files are important to it or else how is it going to know what it’s going to build in the first place. For most projects the amount of data that *needs* copied (versus is just stuff that is sitting there taking up space) is pretty small, so even on a really slow HDD the copy operating should not be a significant amount of time. It’s also not a particularly hard thing to implement I think— certainly it’s much easier than actually building a project in the first place.
There’s a principle here at Amazon that goes, “Good intentions don’t matter”. Which essentially means that simply saying you’re going to do something good doesn’t count because you’re inevitably going to forget or mess up and that instead of just having the intention to do something, you should have a process in place that ensures it is going to happen. Saying that we’re going to make the copying optional and hope that the build tools correctly build in place without an issue feels like a “good intention” to me, whereas adding the API and step that *mandates* (through technical means) they do it correctly is putting a process in place that ensures it is going to happen.
— Donald Stufft

On 2017-06-01 20:45:53 +0000 (+0000), Brett Cannon wrote: [...]
I think *twine* is the tool that needs a way to specify how to produce an sdist. If we want to view twine as the tool to upload artifacts to PyPI then we need twine to know how to produce sdists and wheels in a PEP 517 world, not pip.
[...]
Why do you think that? Because traditionally you could call setup.py to upload an sdist as well as build it?
One thing I really like about twine, as the tool I trust with my PyPI creds, is that it's a very simple tool unencumbered by unrelated features. While I agree that the tool which retrieves and installs packages doesn't necessarily also need to be the tool which builds packages, I don't see why the tool which securely uploads packages should take on that function either. In the UNIX sense of doing one thing well, I'd much rather see a separate tool for each of these roles.

On Jun 1, 2017, at 7:53 PM, Jeremy Stanley fungi@yuggoth.org wrote:
On 2017-06-01 20:45:53 +0000 (+0000), Brett Cannon wrote: [...]
I think *twine* is the tool that needs a way to specify how to produce an sdist. If we want to view twine as the tool to upload artifacts to PyPI then we need twine to know how to produce sdists and wheels in a PEP 517 world, not pip.
[...]
Why do you think that? Because traditionally you could call setup.py to upload an sdist as well as build it?
One thing I really like about twine, as the tool I trust with my PyPI creds, is that it's a very simple tool unencumbered by unrelated features. While I agree that the tool which retrieves and installs packages doesn't necessarily also need to be the tool which builds packages, I don't see why the tool which securely uploads packages should take on that function either. In the UNIX sense of doing one thing well, I'd much rather see a separate tool for each of these roles.
I think a separate tool for each of these roles is somewhat user unfriendly TBH.
Splitting things across multiple projects tends to confuse users and increases the conceptual overhead. I sometimes wonder if we should be folding twine into pip itself, although keeping the split between twine == package authoring tool and pip == package installing tool seems like a reasonable enough divide.
— Donald Stufft

On Jun 1, 2017, at 6:09 PM, Donald Stufft donald@stufft.io wrote:
I sometimes wonder if we should be folding twine into pip itself
Yes please. WTB `pip upload`.
-g

On 2017-06-01 21:09:57 -0400 (-0400), Donald Stufft wrote: [...]
I think a separate tool for each of these roles is somewhat user unfriendly TBH.
[...]
I'll do my best not to be offended that you don't consider me a user (or representative of some broader class of users). ;)
At any rate, I think it depends on your definition of users. Some users want one shiny kitchen-sink tool that does everything for them, others want composable tools with well-considered bounds of operation. It's possible a modular approach could satisfy both, but then again if twine grows too many features I'm just as likely to write a new lightweight API client instead so I can have something auditable I can trust my credentials to which only knows how to upload.

On Jun 1, 2017, at 9:40 PM, Jeremy Stanley fungi@yuggoth.org wrote:
On 2017-06-01 21:09:57 -0400 (-0400), Donald Stufft wrote: [...]
I think a separate tool for each of these roles is somewhat user unfriendly TBH.
[...]
I'll do my best not to be offended that you don't consider me a user (or representative of some broader class of users). ;)
I probably should have written out the long form: unfriendly to users who aren’t steeped in packaging lore ;)
At any rate, I think it depends on your definition of users. Some users want one shiny kitchen-sink tool that does everything for them, others want composable tools with well-considered bounds of operation. It's possible a modular approach could satisfy both, but then again if twine grows too many features I'm just as likely to write a new lightweight API client instead so I can have something auditable I can trust my credentials to which only knows how to upload.
Largely to me it’s about not throwing a ton of different things at people that they have to both find and learn. It’s easier to keep things consistent in a single code base (lol Unix which has -R and -r for recursive depending on your tool!) and also easier for people to discover the different commands they need to fully manage a project. This can get particularly difficult when the multitude of different tools evolve at different paces (we see this today where pip will support something but setuptools won’t yet, etc) which requires people to have to care about the versions of more different tools.
I also think it’s perfectly fine to have another tool that competes with twine (or part of twine) that takes a different set of trade offs. Part of the goals of documenting standards around these things instead of just going “well setuptools is the thing you need to use and that’s just it” you can go ahead and write your thing that scratches your particular itch better, and they can have a friendly competition ;).
At the end of the day though, this is a bit of a tangent since it doesn’t matter wether it’s ``pip sdist``, ``twine sdist``, or ``make-me-and-sdist-plz``, the underlying point of having a command to handle that stands.
— Donald Stufft

On 2 June 2017 at 06:45, Brett Cannon brett@python.org wrote:
And so I think in a very wordy way, I just said we need to stop saying "pip needs a standardized way to produce an sdist" and instead start saying "twine needs a way to produce an sdist". And that leads to the question about whether PEP 517 must cover sdist production for twine because we want to have that solved before we have the consumption side in pip in place. Or put another way, are we okay with pip consuming things that twine simply can't make from a pyproject.toml file (yet)? A "yes" means PEP 517 shouldn't be held up, while a "no" means we need a solution for twine.
I largely agree with this phrasing of the problem, but think it's oversimplifying the underlying capabilities needed a bit:
1. sdist creation tools, whether tox, twine, a future "pip publish" command, or something else, need to know how to: 1a. Generate sdist metadata (aka PKG-INFO) 1b. Generate the actual sdist contents 1c. Determine the dependencies needed to handle the above steps 2. Install tools, including pip, need to know how to: 2a. Generate a clean out-of-tree build directory 2b. Generate wheel metadata (including the METADATA file) 2c. Generate the actual wheel contents 2d. Determine the dependencies needed to handle the above steps
The agreed-to-be-missing piece that has been identified in PEP 517 is "2a": the part that will let pip (and any other tools doing out-of-tree builds) avoid copying entire VCS checkouts into the build directory.
Donald's perspective is that instead of being an independent step, generating an out-of-tree build directory should instead be defined in terms of sdist generation on the following grounds:
* it better accounts for local publish-and-install tools like `tox`, which relies on sdist generation to push the code under development into the test venv * defining the "export a build tree" and "generate an sdist tree" steps independently creates the opportunity for future inconsistencies where "VCS -> sdist -> build tree -> wheel" and "VCS -> build tree -> wheel" give different results (just as "pip install project-dir/" and "pip install project-sdist" can give different results today) * `PKG-INFO` in an sdist is essentially the same file as PEP 427's `METADATA`, so any wheel building backend is already required to be able to generate it
While the first point doesn't really bother me (I'm OK with folks needing to keep a "setup.py sdist" shim around for now), I have to agree that I find the second point compelling, as it means that any PEP 517 based project will inherently test the correctness of its sdist generation *just by doing a local install*. That means we won't be relying on opt-in pre-release testing with tox, or post-release bug reports from users any more: if the sdist definition is broken, even local installations from the VCS checkout into an active virtual environment won't work.
The last point means that requiring an sdist export command shouldn't impose an unreasonable burden on backend developers, as the only differences between "generate an sdist tree" and "export a build tree" would be:
- `pyproject.toml` MUST be included unmodified at the root of the sdist tree - `PKG-INFO` MUST be generated at the root of the sdist tree - a `setup.py` shim MAY be generated for backwards compatibility with installation tools that are unaware of PEP 517
Beyond that, both approaches would have the same requirement of "include everything needed to subsequently build a wheel file from the sdist".
The mapping back to the originally identified activities would then be:
1. sdist creation tools: 1a. Generate sdist metadata: export the sdist and read PKG-INFO 1b. Generate the actual sdist contents: export the sdist 1c. Determine the dependencies needed to handle the above steps: look at build-system.requires 2. Install tools: 2a. Generate a clean out-of-tree build directory: export the sdist 2b. Generate wheel metadata: export the wheel metadata directory 2c. Generate the actual wheel contents: build the wheel file 2d. Determine the dependencies needed to handle the above steps: look at build-system.requires & get_build_requires()
So even though 1a, 1b, and 2a are conceptually different activities, backends would only need to implement one operation to handle all 3: sdist export.
Cheers, Nick.

On 2 June 2017 at 08:49, Nick Coghlan ncoghlan@gmail.com wrote:
The last point means that requiring an sdist export command shouldn't impose an unreasonable burden on backend developers, as the only differences between "generate an sdist tree" and "export a build tree" would be:
- `pyproject.toml` MUST be included unmodified at the root of the sdist tree
- `PKG-INFO` MUST be generated at the root of the sdist tree
- a `setup.py` shim MAY be generated for backwards compatibility with
installation tools that are unaware of PEP 517
Note that a full "build an sdist" process should include a means for authors to add extra files (such as README, LICENSE, ...). But that can be either something that backends deal with themselves or (better) gets standardised in a separate PEP (probably defining a new set of `pyproject.toml` fields for it). It's not inherently something we need right now in PEP 517.
Paul

On 2 June 2017 at 18:37, Paul Moore p.f.moore@gmail.com wrote:
On 2 June 2017 at 08:49, Nick Coghlan ncoghlan@gmail.com wrote:
The last point means that requiring an sdist export command shouldn't impose an unreasonable burden on backend developers, as the only differences between "generate an sdist tree" and "export a build tree" would be:
- `pyproject.toml` MUST be included unmodified at the root of the sdist tree
- `PKG-INFO` MUST be generated at the root of the sdist tree
- a `setup.py` shim MAY be generated for backwards compatibility with
installation tools that are unaware of PEP 517
Note that a full "build an sdist" process should include a means for authors to add extra files (such as README, LICENSE, ...). But that can be either something that backends deal with themselves or (better) gets standardised in a separate PEP (probably defining a new set of `pyproject.toml` fields for it). It's not inherently something we need right now in PEP 517.
I think we can leave this as a backend level thing, as it really is entirely up to the backend. From a frontend's perspective, "Recursively copy everything that doesn't start with '.' and then generate PKG-INFO" would be an entirely acceptable sdist export implementation, as would "Export all files that are under source control and then generate PKG-INFO". More complex backends may want more sophisticated options than that, but they can still go in a backend dependent configuration file (ala MANIFEST.in for distutils/setuptools) rather than needing to be standardised.
Cheers, Nick.

As was suggested at some point, I have added a build_sdist hook to my PR, with the following details:
- A brief definition of the minimal requirements of an sdist. - I have limited the definition to gzipped tarballs. Zip files also work as sdists, but we're moving towards standardising on tarballs, so I think it's simplest to require that of PEP-517 compliant tools. - The build_sdist hook must be defined, but may not always work (e.g. it may depend on a VCS) - The prepare_build_files hook is optional, and in its absence, frontends can use build_sdist and extract the files to create a build directory. - Backends (like flit) where building an sdist has extra requirements should define prepare_build_files.

On 2 June 2017 at 14:42, Thomas Kluyver thomas@kluyver.me.uk wrote:
- Backends (like flit) where building an sdist has extra requirements
should define prepare_build_files.
I'm struggling to understand why building a sdist in flit should need a VCS. It bothers me that I'm missing something important about how backends will work, that explains why (for example) you can't create a sdist from an export of a VCS branch (i.e., without the VCS metadata).
Can you provide a pointer to the docs on flit's "build a sdist" command, that explains the limitations? (I gather that this is in development, so a pointer to the doc files in VCS is fine).
Paul

On Fri, Jun 2, 2017, at 03:12 PM, Paul Moore wrote:
I'm struggling to understand why building a sdist in flit should need a VCS. It bothers me that I'm missing something important about how backends will work, that explains why (for example) you can't create a sdist from an export of a VCS branch (i.e., without the VCS metadata).
It's the best way I've found to answer the question of which files should go in an sdist. The other things that we don't want to do include:
1. Get only the files needed to install and use the library, as we do for a wheel. Bad because people expect sdists to include things like docs and tests (if you put tests outside the package). 2. Tar up the whole directory containing flit.ini (/pyproject.toml). Bad because this will include things like built docs, VCS metadata, and random files you've made, so the sdist will be much bigger than necessary. 3. Hard-coded blacklist/whitelist. Not flexible enough - we can't cover all the different ways you might do docs, for instance. 4. Configurable blacklist/whitelist. This is what MANIFEST.in provides. I think we could come up with a more memorable syntax than MANIFEST.in - probably something like gitignore - but I'm not keen on adding back another boilerplate file. And the big problem I have with MANIFEST.in is that it's easy to forget to update it when you add some files that need to be in the sdist.
I think the key realisation for me was that the files I want in an sdist are the same set of files in a fresh checkout of the VCS repo. I want it to be a static snapshot of what was in my VCS when I released (plus, for the sake of other tools, a couple of generated files). So the necessary information to make the sdist is there in the VCS.
Can you provide a pointer to the docs on flit's "build a sdist" command, that explains the limitations? (I gather that this is in development, so a pointer to the doc files in VCS is fine).
I appreciate your optimism about my docs. ;-)

On 2017-06-02 15:12:19 +0100 (+0100), Paul Moore wrote: [...]
I'm struggling to understand why building a sdist in flit should need a VCS. It bothers me that I'm missing something important about how backends will work, that explains why (for example) you can't create a sdist from an export of a VCS branch (i.e., without the VCS metadata).
[...]
Unrelated to flit, but I have similar needs to be able to make sure my sdist version has a 1:1 correspondence to the name of a release tag in my VCS. Making a commit to embed a version number in a file in the VCS and then tagging that commit with the same version number is 1. racy and 2. duplication of information which can (and frequently has) led to confusing mistakes. As a result, I either need the VCS metadata present to be able to construct the version number which will get included in PKG-INFO _or_ I need a complex build wrapper which extracts the metadata prior to the filtered tree copy happening and plumbs it through (with an envvar or maybe spontaneously generating a file on disk) so that the sdist builder will know what version to embed.
Present day, this works fine as a setuptools entrypoint which can inspect the VCS metadata at sdist creation time. It would be unfortunate to lose such flexibility in whatever hook implementation we end up with.

On 2 June 2017 at 23:42, Thomas Kluyver thomas@kluyver.me.uk wrote:
As was suggested at some point, I have added a build_sdist hook to my PR, with the following details:
- A brief definition of the minimal requirements of an sdist.
work as sdists, but we're moving towards standardising on tarballs, so I think it's simplest to require that of PEP-517 compliant tools.
- I have limited the definition to gzipped tarballs. Zip files also
For the sdist case, I'd prefer to leave the actual archive creation in the hands of the frontend as far as the plugin API is concerned. That lets us completely duck the fact that the sdist naming scheme and exact archive format aren't formally defined anywhere, and for pip's local build use case, we want the unpacked tree anyway.
In a lot of ways, it's closer in spirit to the wheel metadata generation hook than it is to the wheel building hook.
- The build_sdist hook must be defined, but may not always work (e.g. it
may depend on a VCS)
I was going to object to this aspect, but I realised there's a clear marker file that frontends can use to determine if they're working with an already exported sdist tree: PKG-INFO
That means the invocation protocol for the additional hook can be:
- if PKG-INFO is present, then just copy the full contents of the directory without invoking the backend's sdist export hook - if PKG-INFO is *not* present, then invoke the backend's sdist export hook to do a filtered export that at least omits any VCS bookkeeping files
- The prepare_build_files hook is optional, and in its absence,
frontends can use build_sdist and extract the files to create a build directory.
- Backends (like flit) where building an sdist has extra requirements
should define prepare_build_files.
Having two hooks still leaves us open to "VCS -> sdist -> build tree -> wheel" and "VCS -> build tree -> wheel" giving different answers, and that's specifically the loophole we're aiming to close by including this in PEP 517 rather than leaving it until later.
Instead, the flow that I think makes sense is "VCS -> sdist tree [-> sdist tree -> sdist tree -> ...] -> wheel", and the above model where the export filtering is only used when PKG-INFO doesn't exist yet will give us that.
Cheers, Nick.

On Fri, Jun 2, 2017, at 03:41 PM, Nick Coghlan wrote:
Instead, the flow that I think makes sense is "VCS -> sdist tree [-> sdist tree -> sdist tree -> ...] -> wheel", and the above model where the export filtering is only used when PKG-INFO doesn't exist yet will give us that.
I still object to conflating 'filter the files needed to build a wheel' with 'build an sdist' - these are different tasks which I would implement differently. And flit cannot do (sdist tree -> sdist tree).
The options as I see them:
1. Make it the responsibility of the backend, not the frontend, to build cleanly (except for setup.py builds). Then there's no need for a hook to filter a build tree before building a wheel. 2. Define a hook to filter the files into a build tree, as a separate notion from building sdists.
Thomas

[Note: I've reverted the PEP to Draft status while this discussion is ongoing: https://github.com/python/peps/blob/master/pep-0517.txt]
On 3 June 2017 at 00:56, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Fri, Jun 2, 2017, at 03:41 PM, Nick Coghlan wrote:
Instead, the flow that I think makes sense is "VCS -> sdist tree [-> sdist tree -> sdist tree -> ...] -> wheel", and the above model where the export filtering is only used when PKG-INFO doesn't exist yet will give us that.
I still object to conflating 'filter the files needed to build a wheel' with 'build an sdist' - these are different tasks which I would implement differently
This concerns me somewhat, as if a backend implements the two differently, then it means building from an sdist and building from a VCS checkout may give different results (since they may contain different files).
Could you provide a little more detail as to what you would do differently in exporting the contents of an sdist that wouldn't apply to export a build tree? (aside from skipping emitting PKG-INFO)
. And flit cannot do (sdist tree -> sdist tree).
It wouldn't be required to - since a PKG-INFO would be present in that case, the front end would just copy the directory without bothering the backend about it.
The options as I see them:
- Make it the responsibility of the backend, not the frontend, to build
cleanly (except for setup.py builds). Then there's no need for a hook to filter a build tree before building a wheel.
No, we're not going to do that - build isolation will be the frontend's responsibility.
- Define a hook to filter the files into a build tree, as a separate
notion from building sdists.
While Donald seems more amenable to this now, I still don't understand the difference you see between the two (aside from PKG-INFO potentially being unneeded in the build tree case, depending on how the backend handles creation of the METADATA file)
Cheers, Nick.

On Fri, Jun 2, 2017, at 05:26 PM, Nick Coghlan wrote:
Could you provide a little more detail as to what you would do differently in exporting the contents of an sdist that wouldn't apply to export a build tree? (aside from skipping emitting PKG-INFO)
When creating an sdist, I query the VCS to work out what files to put in it. This is brittle - it depends on the directory being a VCS checkout, and the relevant VCS being available to call. And it's relatively slow, because we have to shell out to another process. I've decided those are acceptable trade-offs for the project maintainer making the release.
When exporting a build tree, I would copy only the files that are needed to make the wheel. This is simple, robust and fast.
I can even generate PKG-INFO when exporting a build tree if that helps. But I want to keep the idea of a build tree used as an intermediate to generating a wheel separate from that of an sdist, which is a release artifact.
Thomas

On 3 June 2017 at 02:50, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Fri, Jun 2, 2017, at 05:26 PM, Nick Coghlan wrote:
Could you provide a little more detail as to what you would do differently in exporting the contents of an sdist that wouldn't apply to export a build tree? (aside from skipping emitting PKG-INFO)
When creating an sdist, I query the VCS to work out what files to put in it. This is brittle - it depends on the directory being a VCS checkout, and the relevant VCS being available to call. And it's relatively slow, because we have to shell out to another process. I've decided those are acceptable trade-offs for the project maintainer making the release.
When exporting a build tree, I would copy only the files that are needed to make the wheel. This is simple, robust and fast.
Oh, I get it - for a build tree, you *know* which files you really need (since it's specified as part of flit's settings), whereas for the sdist, you also need to get all the *other* files that the build doesn't need, but the sdist should contain anyway.
And this distinction will become even more important if Nathaniel gets his wish and we some day extend the backend API definition to support generating multiple wheel files from the same sdist.
I can even generate PKG-INFO when exporting a build tree if that helps.
Now that I understand the point you were making, I don't think that's necessary.
But I want to keep the idea of a build tree used as an intermediate to generating a wheel separate from that of an sdist, which is a release artifact.
Right, that makes sense to me now. It also means that even when building from a VCS checkout, you may not need the tools to *query* that VCS if things like your version number are specified in a static file.
Cheers, Nick.

On Jun 2, 2017, at 10:41 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 2 June 2017 at 23:42, Thomas Kluyver thomas@kluyver.me.uk wrote:
As was suggested at some point, I have added a build_sdist hook to my PR, with the following details:
- A brief definition of the minimal requirements of an sdist.
- I have limited the definition to gzipped tarballs. Zip files also
work as sdists, but we're moving towards standardising on tarballs, so I think it's simplest to require that of PEP-517 compliant tools.
For the sdist case, I'd prefer to leave the actual archive creation in the hands of the frontend as far as the plugin API is concerned. That lets us completely duck the fact that the sdist naming scheme and exact archive format aren't formally defined anywhere, and for pip's local build use case, we want the unpacked tree anyway.
In a lot of ways, it's closer in spirit to the wheel metadata generation hook than it is to the wheel building hook.
I’d prefer to leave the actual creation of the archive up to the front end in both the sdist and the wheel case. It will allow some cases of pip to avoid having to round trip through a compression -> decompression cycle and can instead just use it directly. Other cases it won’t allow that, but in those cases it doesn’t really add any more time or complexity, it just shifts the time spent compressing from the backend to the frontend.
- The build_sdist hook must be defined, but may not always work (e.g. it
may depend on a VCS)
I was going to object to this aspect, but I realised there's a clear marker file that frontends can use to determine if they're working with an already exported sdist tree: PKG-INFO
That means the invocation protocol for the additional hook can be:
- if PKG-INFO is present, then just copy the full contents of the
directory without invoking the backend's sdist export hook
- if PKG-INFO is *not* present, then invoke the backend's sdist export
hook to do a filtered export that at least omits any VCS bookkeeping files
- The prepare_build_files hook is optional, and in its absence,
frontends can use build_sdist and extract the files to create a build directory.
- Backends (like flit) where building an sdist has extra requirements
should define prepare_build_files.
Having two hooks still leaves us open to "VCS -> sdist -> build tree -> wheel" and "VCS -> build tree -> wheel" giving different answers, and that's specifically the loophole we're aiming to close by including this in PEP 517 rather than leaving it until later.
Instead, the flow that I think makes sense is "VCS -> sdist tree [-> sdist tree -> sdist tree -> ...] -> wheel", and the above model where the export filtering is only used when PKG-INFO doesn't exist yet will give us that.
So my preference is that everything goes through the sdist step as I think that is most likely to provide consistent builds everywhere both from a VCS checkout and from a sdist that was released to PyPI. That being said, I am somewhat sympathetic to the idea that generating a sdist might be a slow process for reasons that are unrelated to actually building a wheel (for example, documentation might get “compiled” from some kind of source format to a man page, html docs, etc) so I think I am not against the idea of having an optional hook whose job is to just do the copying needed. The requirements would be:
* The build_sdist hook is mandatory, but may fail (as any of these commands may fail tbh) if some invariant required by the build backend isn’t satisfied. * The copy_the_files hook is optional, if it exists it SHOULD produce a tree that when the build_wheel hook is called in it, will produce a wheel that is equivalent to one that would have been built had the build_sdist hook been called instead. * If the copy_the_files hook is not defined, then the build frontend is free to just directory call the build_sdist command instead.
I think that represents a pretty reasonable trade off, the path of least resistance for a build backend is to just define build_sdist and build_wheel and leave the two optional hooks omitted. I suspect for a lot of pure python packages (although Thomas has said not flit) those two hooks will be fast enough that is all they’ll need to implement. However in cases they’re not we provide both the copy_the_files and the wheel_metadata hook to allow short circuiting a possibly more complex build process to provide a better UX to end users. That kinds of goes against my “good intentions don’t matter” statement from before, but I also think that practicality beats purity ;)
— Donald Stufft

On 2 June 2017 at 16:27, Donald Stufft donald@stufft.io wrote:
So my preference is that everything goes through the sdist step as I think that is most likely to provide consistent builds everywhere both from a VCS checkout and from a sdist that was released to PyPI.
Agreed. That's the ideal workflow. The only reason we don't do it now is because... well, I'm not quite sure. I think it's to do with things like setuptools_scm not generating suitable "temporary version numbers" to allow us to work properly with installs that assume that name/version uniquely identifies the code.
But regardless, I'd like to say that under PEP 517, we will go source tree -> sdist -> wheel -> install, for everything but editable installs. If projects like setuptools_scm will have an issue with this, they need to feed their requirements into the process of agreeing PEP 517 - the pip developers can't really argue their case for them.
That being said, I am somewhat sympathetic to the idea that generating a sdist might be a slow process for reasons that are unrelated to actually building a wheel (for example, documentation might get “compiled” from some kind of source format to a man page, html docs, etc) so I think I am not against the idea of having an optional hook whose job is to just do the copying needed. The requirements would be:
Agreed. Optimising the process is perfectly OK with me, but I do think we should treat it as just that, an optimisation, and require that backends implementing the optimisation route must ensure that it gives the same results as going via a sdist.
Note that there's an implication here - if we define the build process in terms of the effect of "going via a sdist", then we need to at least have an intuitive understanding of what that means in practice. I don't think it's a contentious point (even if the specific term "sdist" is open to debate), as I think repeatable builds are a well-understood idea. (It's at this point that the concerns of people who want incremental builds come in - we should support incremental builds in a way that preserves the "just like going via a sdist" principle. But again, they need to raise their concerns if they think we're missing something key to their use case).
- The build_sdist hook is mandatory, but may fail (as any of these commands
may fail tbh) if some invariant required by the build backend isn’t satisfied.
Agreed. Other tools (or future additions to pip), that want to provide a common interface to the "build a sdist" functionality would use this hook too. They may not be able to fall back in the same way as pip can, but that's their issue to address.
- The copy_the_files hook is optional, if it exists it SHOULD produce a tree
that when the build_wheel hook is called in it, will produce a wheel that is equivalent to one that would have been built had the build_sdist hook been called instead.
This is precisely the "should look like we built a sdist" principle, so I'm a solid +1 on this, too. It might be worth stating that copy_the_files is only intended to be called after a failed call to build_sdist. I don't know if backends would care, but I don't think we should worry about having to support use of copy_the_files as anything other than a build_sdist fallback.
- If the copy_the_files hook is not defined, then the build frontend is free
to just directory call the build_sdist command instead.
Sorry? I assume here that you mean "directly call the build_wheel hook in the original source tree"? That's OK, but I think we should be clear that if this happens, it is the backend's responsibility to ensure that the build is equivalent to building from a sdist. It might even be appropriate for the front end to warn if this happens - "Unable to build out of tree - results may differ from a clean build" (The intent is to remind people that they aren't testing the actual sdist they will be deploying, the issue that Nick pointed out).
One thing that is worth flagging here, if only as a note for backends considering this approach, is that the source tree could have arbitrary out of date build artifacts in it (from previous build_wheel calls, possibly with different settings, or from the source tree being installed editable, or even from the user doing something like a manual debug build) and the backend must take responsibility for ensuring that those artifacts don't affect the result. (In case it's not obvious, my personal feeling is that this is a pretty risky option, and I'd strongly prefer backends that implement at least one of the hooks allowing out-of-tree builds).
I think that represents a pretty reasonable trade off, the path of least resistance for a build backend is to just define build_sdist and build_wheel and leave the two optional hooks omitted. I suspect for a lot of pure python packages (although Thomas has said not flit) those two hooks will be fast enough that is all they’ll need to implement. However in cases they’re not we provide both the copy_the_files and the wheel_metadata hook to allow short circuiting a possibly more complex build process to provide a better UX to end users. That kinds of goes against my “good intentions don’t matter” statement from before, but I also think that practicality beats purity ;)
Agreed, this is looking reasonable to me. I think it covers pip's requirements, and I hope it addresses Thomas' needs for flit. I don't honestly have a feel for what other backends might look like, so I'll leave other people to comment on those.
Paul

On Jun 2, 2017, at 12:39 PM, Paul Moore p.f.moore@gmail.com wrote:
On 2 June 2017 at 16:27, Donald Stufft <donald@stufft.io mailto:donald@stufft.io> wrote:
So my preference is that everything goes through the sdist step as I think that is most likely to provide consistent builds everywhere both from a VCS checkout and from a sdist that was released to PyPI.
Agreed. That's the ideal workflow. The only reason we don't do it now is because... well, I'm not quite sure. I think it's to do with things like setuptools_scm not generating suitable "temporary version numbers" to allow us to work properly with installs that assume that name/version uniquely identifies the code.
I’m pretty sure the only reason we don’t do it now is because nobody has had the time to make it happen yet. The problems before weren’t from going via sdist, they were from trying to modify our copy tree implementation to filter out .tox, .git, etc. I don’t think we’ve ever tried going via sdist (other than there is an open PR for it, but it ended up stalling https://github.com/pypa/pip/pull/3722 https://github.com/pypa/pip/pull/3722). Essentially, volunteer time is finite :(
— Donald Stufft

On 2 June 2017 at 18:02, Donald Stufft donald@stufft.io wrote:
I’m pretty sure the only reason we don’t do it now is because nobody has had the time to make it happen yet. The problems before weren’t from going via sdist, they were from trying to modify our copy tree implementation to filter out .tox, .git, etc. I don’t think we’ve ever tried going via sdist (other than there is an open PR for it, but it ended up stalling https://github.com/pypa/pip/pull/3722).
Oh, yes - it's the filtering stuff out work that I remembered, plus the fact that we'd discussed going via sdist.
Essentially, volunteer time is finite :(
In my case, coding time is very limited, but time to write long email responses seems to be freely available. Sigh :-(
Paul

On Jun 2, 2017, at 12:39 PM, Paul Moore p.f.moore@gmail.com wrote:
- The copy_the_files hook is optional, if it exists it SHOULD produce a tree
that when the build_wheel hook is called in it, will produce a wheel that is equivalent to one that would have been built had the build_sdist hook been called instead.
This is precisely the "should look like we built a sdist" principle, so I'm a solid +1 on this, too. It might be worth stating that copy_the_files is only intended to be called after a failed call to build_sdist. I don't know if backends would care, but I don't think we should worry about having to support use of copy_the_files as anything other than a build_sdist fallback.
- If the copy_the_files hook is not defined, then the build frontend is free
to just directory call the build_sdist command instead.
Sorry? I assume here that you mean "directly call the build_wheel hook in the original source tree"? That's OK, but I think we should be clear that if this happens, it is the backend's responsibility to ensure that the build is equivalent to building from a sdist. It might even be appropriate for the front end to warn if this happens - "Unable to build out of tree - results may differ from a clean build" (The intent is to remind people that they aren't testing the actual sdist they will be deploying, the issue that Nick pointed out).
Should have kept reading before sending my email, sorry!
The steps here would basically be (for building from something that isn’t already a .tar.gz or a .whl):
# Get our backend using the PEP 517 resolving methods backend = get_the_backend()
# Get a copied source tree that is acceptable for using to build a wheel. We # allow copy_files to be used in place of build_sdist to provide an optimization # in cases where build_sdist would be very slow. However the build backend # must ensure that the resulting wheel would not be different than if we had # built it from the sdist instead. If hasattr(backend, “copy_files”): try: backend.copy_files(…) except Exception: backend.build_sdist(…) else: backend.build_sdist(…)
# Determine what depends we need to install the wheel file, we allow the # build tool to optionally give us the deps without actually invoking the wheel build # as an optimization that building the wheel might take awhile, however # the build backend must ensure that the metadata returned here matches the final # wheel built. If hasattr(backend, “get_wheel_metadata”): backend.get_wheel_metadata(…) has_already_built_wheel = False else: backend.build_wheel(…) has_already_built_wheel = True
# Resolve dependencies, etc
# Go on to build the wheel if we haven’t already built it. If not has_already_built_wheel: backend.build_wheel(…)
# Install the wheel
— Donald Stufft

On Fri, Jun 2, 2017, at 06:14 PM, Donald Stufft wrote:
The steps here would basically be (for building from something that isn’t already a .tar.gz or a .whl):
That sounds OK to me. I think the only remaining point of contention is whether the build_sdist hook should make an archive or an unpacked directory. I'm not entirely sold, but Nick's point about not having to specify the archive format is enough that I've changed my PR to specify an unpacked sdist. Thomas

On Fri, Jun 2, 2017 at 3:58 PM, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Fri, Jun 2, 2017, at 06:14 PM, Donald Stufft wrote:
The steps here would basically be (for building from something that isn’t already a .tar.gz or a .whl):
That sounds OK to me. I think the only remaining point of contention is whether the build_sdist hook should make an archive or an unpacked directory. I'm not entirely sold, but Nick's point about not having to specify the archive format is enough that I've changed my PR to specify an unpacked sdist.
This isn't a question directed at what I've quoted here, but seems as good of place as any.
I want to make sure I understand what I'd need to do, as a user, in a post PEP 517 world. Say I wanted to accomplish the following three things:
* Generate version info from my VCS * Generate .h and .c from .pyx or cffi's out-of-line API mode * Use an alternative build process for a platform distutils isn't behaving with (ignore this requirement if it leads to problems answering, we can just as well assume distutils here)
Since I can only define one entrypoint for build-system.build-backend, I can't point directly at setuptools_scm (or equiv) as it only achieves the first, I can't point directly at cython or cffi (or equiv) as it only achieves the second, and I can't point directly at my-custom-backend or distutils as it only achieves the third... right? If I added setuptools_scm and cffi to build-system.requires, will it have an opportunity to somehow do something useful before my-custom-backend begins its work?
Does this mean, for many situations, unless a single backed supports all the things I want to do automatically to my source tree prior to building, I need to write a local wrapper build-system backend, in my project, that farms out build API calls to each of the backends above, myself? Can I call it setup.py?
I'm not suggesting the above is good or bad, or even suggesting it wouldn't be an improvement over what we have now, I'm just trying to think through how it works and how I'd end up doing things.

On 3 June 2017 at 04:53, C Anthony Risinger c@anthonyrisinger.com wrote:
I want to make sure I understand what I'd need to do, as a user, in a post PEP 517 world. Say I wanted to accomplish the following three things:
- Generate version info from my VCS
- Generate .h and .c from .pyx or cffi's out-of-line API mode
- Use an alternative build process for a platform distutils isn't behaving
with (ignore this requirement if it leads to problems answering, we can just as well assume distutils here)
Since I can only define one entrypoint for build-system.build-backend, I can't point directly at setuptools_scm (or equiv) as it only achieves the first, I can't point directly at cython or cffi (or equiv) as it only achieves the second, and I can't point directly at my-custom-backend or distutils as it only achieves the third... right? If I added setuptools_scm and cffi to build-system.requires, will it have an opportunity to somehow do something useful before my-custom-backend begins its work?
I agree, this isn't something that's been clearly set out yet. Obviously setuptools support isn't going away any time soon, so you can just carry on as you do now - but I understand that's not what you're asking.
Let's assume there's a new PEP 517 backend, call it setuptools_tng, that handles building C files using the platform compiler. That's basically the replacement for setup.py. Then it would be down to that backend to provide a means of hooking into setuptools_scm and cython, much like at the moment you specify that you're using them in setup.py.
Obviously, you might want to use setuptools_scm with other backends. In that case, maybe backend developers would need to come up with a unified plugin system. But it could just be a case of saying to the flit (or other backend) developers, "please add support for setuptools_scm".
So, basically the answer is that this is an open question for the new backend development ecosystem. But it's not something PEP 517 needs to take a view on.
Hope that helps, Paul

On Jun 3, 2017 4:47 AM, "Paul Moore" p.f.moore@gmail.com wrote:
On 3 June 2017 at 04:53, C Anthony Risinger c@anthonyrisinger.com wrote:
I want to make sure I understand what I'd need to do, as a user, in a post PEP 517 world. Say I wanted to accomplish the following three things:
- Generate version info from my VCS
- Generate .h and .c from .pyx or cffi's out-of-line API mode
- Use an alternative build process for a platform distutils isn't behaving
with (ignore this requirement if it leads to problems answering, we can
just
as well assume distutils here)
[...]
So, basically the answer is that this is an open question for the new backend development ecosystem. But it's not something PEP 517 needs to take a view on.
Fair enough. It seems like there will almost certainly emerge some way of chaining small "source tree mutators" (leading to an sdist) with truly custom build backends (that may ultimately terminate on either setuptools/distutils like you mention, or a completely separate toolchain [I want to say something like waf could be this alternate]).
This wrapper/pipeline layer could be baked into pip/flit/whatever as a plugin system, but ideally it would just be a small and blessed pypa tool I'd think... then I suppose to make use of multiple transformations, a user would pass a list of actual build-system backends via tool.BLESSED-CHAINER-APP.build-backends in pyproject.toml or something.
Is it unreasonable to request right now that build-system.build-backend be a repeatable key in pyproject.toml? Then I could just list them in order. Might be easy to add later without breakage though too.
As long as all backends understand the hard separation between build_sdist "prepare the redistributable source tree" and build_wheel "construct an installable" they can be called in proper order and in phases.

On Sat, Jun 3, 2017 at 5:09 PM, C Anthony Risinger c@anthonyrisinger.com wrote:
Fair enough. It seems like there will almost certainly emerge some way of chaining small "source tree mutators" (leading to an sdist) with truly custom build backends (that may ultimately terminate on either setuptools/distutils like you mention, or a completely separate toolchain [I want to say something like waf could be this alternate]).
This wrapper/pipeline layer could be baked into pip/flit/whatever as a plugin system, but ideally it would just be a small and blessed pypa tool I'd think... then I suppose to make use of multiple transformations, a user would pass a list of actual build-system backends via tool.BLESSED-CHAINER-APP.build-backends in pyproject.toml or something.
Is it unreasonable to request right now that build-system.build-backend be a repeatable key in pyproject.toml? Then I could just list them in order. Might be easy to add later without breakage though too.
As long as all backends understand the hard separation between build_sdist "prepare the redistributable source tree" and build_wheel "construct an installable" they can be called in proper order and in phases.
That's a neat idea, but I'd prefer that this be handled by someone (you?) writing a metabuild system that implements whatever layering logic they prefer and putting it up on pypi, so it can be used like:
[build-system] build-backend = metabuild
[tool.metabuild] backends = ["backend1", "backend2", "backend3"]
As you can see, standards are hard; it takes hundreds of emails to standardize the API of a function that just takes a source tree and outputs a wheel :-). So the idea of PEP 517 is to push as much actual semantics as possible into the build backend, where they can be easily changed and evolved and experimented with. It's not at all obvious to me right now whether monolithic or plugin-style build backends will predominate, and it's also not at all obvious to me how *exactly* you're imagining the layering/phases would work, but the great thing is that PEP 517 is already flexible enough to let you do what you want without having to convince me first :-)
-n

On Fri, Jun 2, 2017 at 9:39 AM, Paul Moore p.f.moore@gmail.com wrote:
Note that there's an implication here - if we define the build process in terms of the effect of "going via a sdist", then we need to at least have an intuitive understanding of what that means in practice. I don't think it's a contentious point (even if the specific term "sdist" is open to debate), as I think repeatable builds are a well-understood idea. (It's at this point that the concerns of people who want incremental builds come in - we should support incremental builds in a way that preserves the "just like going via a sdist" principle. But again, they need to raise their concerns if they think we're missing something key to their use case).
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
Relatedly, the idea of a copy_files hook doesn't make sense to me. The only reason pip wants to force builds through the sdist phase is because it doesn't trust the backend to make clean wheels, and it's willing to make its local directory builds much slower to get that guarantee. When you add copy_files, you lose that guarantee *and* you're still making local directory builds much slower, so what's the point? If the always-via-sdist plan doesn't work for either the simplest cases (flit) or the most complex (incremental builds), then is it really a good plan?
So it seems clear that we should do: - add get_build_sdist_requires and build_sdist hooks to PEP 517 or a new PEP, whichever (and for clarity rename the current get_build_requires -> get_build_wheel_requires)
Beyond that, my preference is: - 'pip install local-directory/' switches to building in-place
If the pip devs don't trust build systems in general, but (as suggested by copy_files discussion) are ok with trusting them if they promise to be super trustworthy, alternate proposal: - add a 'in_place_build_safe = True' hook, which indicates that the build system has been carefully written so that this will generate the same result as building an sdist and then building that; pip checks for this to decide whether to build in place or to build an sdist first.
In principle this is a little silly (who doesn't think themselves trustworthy?), but it would let us continue to do build isolation for setuptools builds and any build system that hasn't put some thought into this, makes it clear where the responsibility lies if someone screws up, and backends that don't want to deal with building sdists from sdists could make this False for VCS checkouts and True for unpacked sdists.
If the pip devs are set on 'pip install local-directory/' always making some sort of copy, then I suggest: - 'pip install local-directory/' makes an sdist and then builds it - we also add something like 'pip devinstall local-directory/' that does an in-place build and then installs it, so that I don't have to make a separate 'devinstall' script and ship it separately - 'flit' adds code to make sdist-from-sdist work. (One way: when building an sdist from a VCS checkout, make a list of all the ancillary files and include it in the resulting sdist. Or possibly just a list of all files + hashes. When asked to make an sdist from an arbitrary directory, check for this file, and if present use it as the list of ancillary files to include, and possibly check if any hashes have changed, and if so change the version number of the resulting sdist by appending "+dirty" or something; otherwise, use the current VCS-based system.)
One thing that's not clear to me: a crucial use case for sdists is (1) download, (2) unpack, (3) patch the source, possibly adding new files, (4) build and install. (After all, the whole reason we insist on distributing sdists is that open source software should be modifiable by the recipient.) Does flit currently support this, given the reliance on VCS metadata?
Other unresolved issues:
- Donald had some concerns about get_wheel_metadata and they've led to several suggestions, none of which has made everyone go "oh yeah obviously that's the solution". To me this suggests we should go ahead and drop it from PEP 517 and add it back later if/when the need is more obvious. It's optional anyway, so adding it later doesn't hurt anything.
- It sounds like there's some real question about how exactly a build frontend should handle the output from build_wheel; in particular, the PEP should say what happens if there are multiple files deposited into the output dir. My original idea when writing the PEP was that the build frontend would know the name/version of the wheel it was looking for, and so it would ignore any other files found in the output dir, which would be forward compatible with a future PEP allowing build_wheel to drop multiple wheels into the output dir (i.e., old pip's would just ignore them). It's clear from the discussion that this isn't how others were imagining it. Which is fine, I don't think this is a huge problem, but we should nail it down so we're not surprised later.
-n
[1] Donald's suggestion of silently caching intermediate files in some global cache dir is unreasonably difficult to implement in a user-friendly way – cache management is Hard, and I frankly I still don't think users will accept individual package's build systems leaving hundreds of megabytes of random gunk inside hidden directories. We could debate the details here, but basically, if this were a great idea to do by default, then surely one of cmake/autoconf/... would already do it? Also, my understanding is the main reason pip wants to copy files in the first place is to avoid accidental pollution between different builds using the same local tree; but if a build system implements a global cache like this then surprise, now you can get pollution between arbitrary builds using different trees, or between builds that don't even use a local tree at all (e.g. running 'pip install numpy==1.12.0' can potentially cause a later run of 'pip install numpy==1.12.1' to be corrupted). And, it assumes that all build systems can easily support out-of-tree incremental builds, which is often true but not guaranteed when your wheel build has to wrap some random third party C library's build system.

On Jun 2, 2017, at 10:14 PM, Nathaniel Smith njs@pobox.com wrote:
On Fri, Jun 2, 2017 at 9:39 AM, Paul Moore p.f.moore@gmail.com wrote:
Note that there's an implication here - if we define the build process in terms of the effect of "going via a sdist", then we need to at least have an intuitive understanding of what that means in practice. I don't think it's a contentious point (even if the specific term "sdist" is open to debate), as I think repeatable builds are a well-understood idea. (It's at this point that the concerns of people who want incremental builds come in - we should support incremental builds in a way that preserves the "just like going via a sdist" principle. But again, they need to raise their concerns if they think we're missing something key to their use case).
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
If someone wants to implement a direct-to-wheel build tool and have it compete with ``pip install .`` they’re more than welcome to. Competition is healthy and at the very worst case it could validate either the idea that direct-to-wheel is important enough that people will gladly overcome the relatively small barrier of having to install another tool and then we have data to indicate maybe we need to rethink things or it could validate the idea that it’s not important enough and leave things as they are.
I went and looked through all 105 pages of pip’s issues (open and closed) and made several searches using any keyword I could think of looking for any issue where someone asked for this. The only times I can find anyone asking for this were you and Ralf Gommers as part of the extended discussion around this set of PEPs and I’ve not been able to find a single other person asking for it or complaining about it.
However, what I was able to find was what appears to be the original reason pip started copying the directory to begin with, https://github.com/pypa/pip/issues/178 https://github.com/pypa/pip/issues/178 which was caused by the build system reusing the build directory between two different virtual environments and causing an invalid installation to happen. The ticket is old enough that I can get at specifics it because it was migrated over from bitbucket. However the fact that we *used* to do exactly what you want and it caused exactly one of problem I was worried about seems to suggest to me that pip is absolutely correct in keeping this behavior.
Relatedly, the idea of a copy_files hook doesn't make sense to me. The only reason pip wants to force builds through the sdist phase is because it doesn't trust the backend to make clean wheels, and it's willing to make its local directory builds much slower to get that guarantee. When you add copy_files, you lose that guarantee *and* you're still making local directory builds much slower, so what's the point? If the always-via-sdist plan doesn't work for either the simplest cases (flit) or the most complex (incremental builds), then is it really a good plan?
It’s not that I don’t trust the backend, it’s that I believe in putting in systems that make it harder to do the wrong thing than the right thing. As it is now building in place correctly requires the build backend to do extra work to ensure that some file that wouldn’t be included in the sdist doesn’t influence the build in some way. Given that I’m pretty sure literally every build tool in existence for Python currently fails this test, I think that is a pretty reasonable statement to say that it might continue to be a problem into the future.
Copying the files makes that harder to do (but still easier than always going through the sdist). If you want to argue that we should always go through the sdist and we shouldn’t have a copy_files hook, I’m ok with that. I’m only partially in favor of it as a performance trade off because I think it passes a high enough bar that it’s unlikely enough for mistakes to be made (and when they do, they’ll be more obvious).
… <snip> ...
- 'flit' adds code to make sdist-from-sdist work. (One way: when
building an sdist from a VCS checkout, make a list of all the ancillary files and include it in the resulting sdist. Or possibly just a list of all files + hashes. When asked to make an sdist from an arbitrary directory, check for this file, and if present use it as the list of ancillary files to include, and possibly check if any hashes have changed, and if so change the version number of the resulting sdist by appending "+dirty" or something; otherwise, use the current VCS-based system.)
This seems reasonable to me.
One thing that's not clear to me: a crucial use case for sdists is (1) download, (2) unpack, (3) patch the source, possibly adding new files, (4) build and install. (After all, the whole reason we insist on distributing sdists is that open source software should be modifiable by the recipient.) Does flit currently support this, given the reliance on VCS metadata?
Other unresolved issues:
- Donald had some concerns about get_wheel_metadata and they've led to
several suggestions, none of which has made everyone go "oh yeah obviously that's the solution". To me this suggests we should go ahead and drop it from PEP 517 and add it back later if/when the need is more obvious. It's optional anyway, so adding it later doesn't hurt anything.
My main concern is the metadata diverging between the get_wheel_metadata and the building wheel phase. The current PEP solves that in a reasonable enough way (and in a way I can assert against). My other concerns are mostly just little API niggles to make it harder to mess up.
I think this one is important to support because we do not to be able to get at the dependencies, and invoking the entire build chain to do that seems like it will be extraordinarily slow.
- It sounds like there's some real question about how exactly a build
frontend should handle the output from build_wheel; in particular, the PEP should say what happens if there are multiple files deposited into the output dir. My original idea when writing the PEP was that the build frontend would know the name/version of the wheel it was looking for, and so it would ignore any other files found in the output dir, which would be forward compatible with a future PEP allowing build_wheel to drop multiple wheels into the output dir (i.e., old pip's would just ignore them). It's clear from the discussion that this isn't how others were imagining it. Which is fine, I don't think this is a huge problem, but we should nail it down so we're not surprised later.
How do you determine the name/version for ``pip install .`` except by running get_wheel_metadata or build_wheel or build_sdist?
-n
[1] Donald's suggestion of silently caching intermediate files in some global cache dir is unreasonably difficult to implement in a user-friendly way – cache management is Hard, and I frankly I still don't think users will accept individual package's build systems leaving hundreds of megabytes of random gunk inside hidden directories. We could debate the details here, but basically, if this were a great idea to do by default, then surely one of cmake/autoconf/... would already do it? Also, my understanding is the main reason pip wants to copy files in the first place is to avoid accidental pollution between different builds using the same local tree; but if a build system implements a global cache like this then surprise, now you can get pollution between arbitrary builds using different trees, or between builds that don't even use a local tree at all (e.g. running 'pip install numpy==1.12.0' can potentially cause a later run of 'pip install numpy==1.12.1' to be corrupted). And, it assumes that all build systems can easily support out-of-tree incremental builds, which is often true but not guaranteed when your wheel build has to wrap some random third party C library's build system.
Make it opt-in and build a hash of the directory into the cache key so different file contents mean different cache objects then. I’m not really sold on the idea that the fact some developers haven’t decided to do it then it is a bad idea. Perhaps those build systems are operating under different constraints than we are (I’m almost certainly sure this is the case).
— Donald Stufft

On 3 June 2017 at 13:38, Donald Stufft donald@stufft.io wrote:
However, what I was able to find was what appears to be the original reason pip started copying the directory to begin with, https://github.com/pypa/pip/issues/178 which was caused by the build system reusing the build directory between two different virtual environments and causing an invalid installation to happen. The ticket is old enough that I can get at specifics it because it was migrated over from bitbucket. However the fact that we *used* to do exactly what you want and it caused exactly one of problem I was worried about seems to suggest to me that pip is absolutely correct in keeping this behavior.
FWIW, I'll also note that in-place builds play merry hell with containerised build tools, volume mounts, and SELinux filesystem labels.
In-place builds *can* be made to work, and when you invest the time to make them work, they give you all sorts of useful benefits (incremental builds, etc), but out-of-tree builds inherently avoid a lot of potential problems (especially in a world where virtual environments are a thing).
As far as "out-of-tree caching" is concerned, all the build systems I'm personally familiar with *except* the C/C++ ones use some form of out of band caching location, even if that's dependency caching rather than build artifact caching.
As an example of the utility of that approach, Atlassian recently updated the Alpha version of their Pipelines feature to automatically manage cache directories and allow them to be shared between otherwise independent builds: https://confluence.atlassian.com/bitbucket/caching-dependencies-895552876.ht...
Cheers, Nick.

On Fri, Jun 2, 2017 at 10:21 PM, Nick Coghlan ncoghlan@gmail.com wrote:
On 3 June 2017 at 13:38, Donald Stufft donald@stufft.io wrote:
However, what I was able to find was what appears to be the original reason pip started copying the directory to begin with, https://github.com/pypa/pip/issues/178 which was caused by the build system reusing the build directory between two different virtual environments and causing an invalid installation to happen. The ticket is old enough that I can get at specifics it because it was migrated over from bitbucket. However the fact that we *used* to do exactly what you want and it caused exactly one of problem I was worried about seems to suggest to me that pip is absolutely correct in keeping this behavior.
FWIW, I'll also note that in-place builds play merry hell with containerised build tools, volume mounts, and SELinux filesystem labels.
In-place builds *can* be made to work, and when you invest the time to make them work, they give you all sorts of useful benefits (incremental builds, etc), but out-of-tree builds inherently avoid a lot of potential problems (especially in a world where virtual environments are a thing).
As far as "out-of-tree caching" is concerned, all the build systems I'm personally familiar with *except* the C/C++ ones use some form of out of band caching location, even if that's dependency caching rather than build artifact caching.
As an example of the utility of that approach, Atlassian recently updated the Alpha version of their Pipelines feature to automatically manage cache directories and allow them to be shared between otherwise independent builds: https://confluence.atlassian.com/bitbucket/caching-dependencies-895552876.ht...
Oh sure, if you have a piece of build *infrastructure*, then all kinds of things make sense. Set up ccache, distcc, cache dependencies, go wild. Mozilla's got a cute version of ccache that puts the cache in s3 so it can be shared among ephemeral build VMs.
That's not what I'm talking about. The case I'm talking about is, like, a baby dev taking their first steps, or someone trying to get a build of a package working on an unusual system:
git clone ..../numpy.git cd numpy # edit some file, maybe a config file saying which fortran compiler this weird machine uses # build and run tests
In this case it would be extremely rude to silently dump all our intermediate build artifacts into ~/.something, but I also don't want to require every new dev opt-in to some special infrastructure and learn new commands – I want there to be gentle onramp from blindly installing packages as a user to hacking on them. Making 'pip install' automatically do incremental builds when run repeatedly on the same working directory accomplishes this better than anything else.
It's not clear to me what cases you're concerned about breaking with "containerised build tools, ...". Are you thinking about, like, 'docker run some-volume -v $PWD:/io pip install /io'? Surely for anything involving containers there should be an explicit wheel built somewhere in there?
-n

On 3 June 2017 at 15:53, Nathaniel Smith njs@pobox.com wrote:
That's not what I'm talking about. The case I'm talking about is, like, a baby dev taking their first steps, or someone trying to get a build of a package working on an unusual system:
git clone ..../numpy.git cd numpy # edit some file, maybe a config file saying which fortran compiler this weird machine uses # build and run tests
It's come up a couple of times before, but this example makes me realise that we should be *explicitly* using "tox" as our reference implementation for "local developer experience", to avoid letting ourselves fall into the trap of optimising too much for pip specifically as the reference installer.
The reason I say that is I actually looked at the tox docs yesterday, and *completely missed* the relevance of one of their config settings to PEP 517 (the one that lets you skip the sdist creation step when its too slow): https://tox.readthedocs.io/en/latest/example/general.html#avoiding-expensive...
However, I'm not sure doing that leads to the conclusion that we need to support in-place builds in PEP 517, as tox's approach to skipping the sdist step in general is to require the user to specify a custom build command. The only "in-place" option it supports directly is editable installs.
So from that perspective, the PEP 517 answer to "How do I do an in-place build?" would be "Use whatever command your backend provides for that purpose".
This makes sense, as this particular abstraction layer isn't meant to hide the build backend from the people *working on* a project - it's only meant to hide it from the people *using* the project.
So as a NumPy or SciPy developer, it's entirely reasonable to have to know that the command for an in-place build is "python setup.py ...".
In this case it would be extremely rude to silently dump all our intermediate build artifacts into ~/.something, but I also don't want to require every new dev opt-in to some special infrastructure and learn new commands – I want there to be gentle onramp from blindly installing packages as a user to hacking on them. Making 'pip install' automatically do incremental builds when run repeatedly on the same working directory accomplishes this better than anything else.
"Build this static snapshot of a thing someone else published so I can use it" and "Build this thing I'm working on so I can test it" are closely related, but not the same, so I think the latter concerns are likely to be better handled through an effort to replace the setuptools specific "pip install -e" with a more general "pip devinstall". That would then map directly to the existing logic in tox, allowing that to migrate from running "pip install -e" when usedevelop=True to instead running "pip devinstall".
It's not clear to me what cases you're concerned about breaking with "containerised build tools, ...". Are you thinking about, like, 'docker run some-volume -v $PWD:/io pip install /io'? Surely for anything involving containers there should be an explicit wheel built somewhere in there?
With live re-loading support in web servers, it's really handy to volume mount your working directory into the container.
Cheers, Nick.

On Fri, Jun 2, 2017 at 8:38 PM, Donald Stufft donald@stufft.io wrote:
On Jun 2, 2017, at 10:14 PM, Nathaniel Smith njs@pobox.com wrote:
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
If someone wants to implement a direct-to-wheel build tool and have it compete with ``pip install .`` they’re more than welcome to. Competition is healthy and at the very worst case it could validate either the idea that direct-to-wheel is important enough that people will gladly overcome the relatively small barrier of having to install another tool and then we have data to indicate maybe we need to rethink things or it could validate the idea that it’s not important enough and leave things as they are.
I went and looked through all 105 pages of pip’s issues (open and closed) and made several searches using any keyword I could think of looking for any issue where someone asked for this. The only times I can find anyone asking for this were you and Ralf Gommers as part of the extended discussion around this set of PEPs and I’ve not been able to find a single other person asking for it or complaining about it.
That's because until now, the message that everyone has received over and over is that the way you install a package from a directory on disk is:
cd directory python setup.py install
and this does incremental builds. (My experience is that even today, most people are surprised to learn that 'pip install' accepts directory paths.)
In our glorious PEP 517 future, we have to teach everyone to stop using 'setup.py install' and instead use 'pip install .'. This switch enables a glorious new future of non-distutils-based build systems and fixes a bunch of other brokenness at the same time, hooray, BUT currently switching to 'pip install' also causes a regression for everyone who's used to incremental builds working.
Ralf and I noticed this because we were looking at getting a head start on the glorious future by making 'pip install' mandatory for numpy and scipy. The reason no-one else has noticed is that we're among the few people that have tried using 'pip install' as their standard install-from-working-tree command. But soon there will be more.
However, what I was able to find was what appears to be the original reason pip started copying the directory to begin with, https://github.com/pypa/pip/issues/178 which was caused by the build system reusing the build directory between two different virtual environments and causing an invalid installation to happen. The ticket is old enough that I can get at specifics it because it was migrated over from bitbucket. However the fact that we *used* to do exactly what you want and it caused exactly one of problem I was worried about seems to suggest to me that pip is absolutely correct in keeping this behavior.
Hmm, it looks to me like that bug is saying that at the time, if you ran 'python setup.py install' *inside the pip source tree*, and then tried to run pip's test suite (possibly via 'setup.py test'), then it broke. I don't think this is related to the behavior of 'pip install .', and I feel like we would know if it were currently true that running 'setup.py install' twice in the same directory produced broken shebang lines. (Again, most people who install from source directories are currently using setup.py install!)
The source tree copying was originally added in:
https://github.com/pypa/pip/commit/57bd8163e4483b7138342da93f5f6bb8460f0e4a
(which is dated ~2 months before that bug you found, and if I'm reading it right tweaks a code path that previously only worked for 'pip install foo.zip' so it also works for 'pip install foo/'). AFAICT the reason it was written this way is that pip started out with the assumption that it was always going to be downloading and unpacking archives, so the logic went:
1) make a temporary directory 2) unpack the sdist into this temporary directory 3) build from this temporary directory
Then, when it came time to add support for building from directories, the structure of the logic meant that by the time pip got to step (2) and realized that it already had a source directory, it was too late -- it was already committed to using the selected temporary directory. So instead of refactoring all this code, they made the minimal change of implementing the "unpack this sdist into this directory" operation for source directories by using shutil.copytree.
I think this chain of reasoning will feel very familiar to anyone working with the modern pip source 5 years later...
It's absolutely true that there are cases where incremental builds can screw things up, especially when using distutils/setuptools. But I don't think this is why pip does things this way originally :-).
It’s not that I don’t trust the backend, it’s that I believe in putting in systems that make it harder to do the wrong thing than the right thing. As it is now building in place correctly requires the build backend to do extra work to ensure that some file that wouldn’t be included in the sdist doesn’t influence the build in some way. Given that I’m pretty sure literally every build tool in existence for Python currently fails this test, I think that is a pretty reasonable statement to say that it might continue to be a problem into the future.
Copying the files makes that harder to do (but still easier than always going through the sdist). If you want to argue that we should always go through the sdist and we shouldn’t have a copy_files hook, I’m ok with that. I’m only partially in favor of it as a performance trade off because I think it passes a high enough bar that it’s unlikely enough for mistakes to be made (and when they do, they’ll be more obvious).
What do you think of letting build backends opt-in to in-place builds?
Other unresolved issues:
- Donald had some concerns about get_wheel_metadata and they've led to
several suggestions, none of which has made everyone go "oh yeah obviously that's the solution". To me this suggests we should go ahead and drop it from PEP 517 and add it back later if/when the need is more obvious. It's optional anyway, so adding it later doesn't hurt anything.
My main concern is the metadata diverging between the get_wheel_metadata and the building wheel phase. The current PEP solves that in a reasonable enough way (and in a way I can assert against). My other concerns are mostly just little API niggles to make it harder to mess up.
I think this one is important to support because we do not to be able to get at the dependencies, and invoking the entire build chain to do that seems like it will be extraordinarily slow.
It's only slow in the case where (a) there's no wheel (obviously), and (b) after getting the dependencies we decide we don't want to install this sdist after all. I imagine numpy for example won't bother implementing get_wheel_metadata because we provide wheels for all the platforms we support and because we have no dependencies, so it is doubly useless AFAICT. But yeah in other cases it could matter. I'm not opposed to including it in general, just thought this might be a way to help get the minimal PEP 517 out the door.
- It sounds like there's some real question about how exactly a build
frontend should handle the output from build_wheel; in particular, the PEP should say what happens if there are multiple files deposited into the output dir. My original idea when writing the PEP was that the build frontend would know the name/version of the wheel it was looking for, and so it would ignore any other files found in the output dir, which would be forward compatible with a future PEP allowing build_wheel to drop multiple wheels into the output dir (i.e., old pip's would just ignore them). It's clear from the discussion that this isn't how others were imagining it. Which is fine, I don't think this is a huge problem, but we should nail it down so we're not surprised later.
How do you determine the name/version for ``pip install .`` except by running get_wheel_metadata or build_wheel or build_sdist?
Well, I was imagining that the semantics of 'pip install .' in a multi-wheel world would be to install all the generated wheels :-). But yeah, it's not really well-specified as currently written.
Possibly the simplest solution is to say that build_wheel has to return a string which names the wheel, and then in the future we could add build_wheel2 which is identical but returns a list of strings, and backwards compatibility would be:
def build_wheel2(...): return build_wheel(...)[0]
-n
[1] Donald's suggestion of silently caching intermediate files in some global cache dir is unreasonably difficult to implement in a user-friendly way – cache management is Hard, and I frankly I still don't think users will accept individual package's build systems leaving hundreds of megabytes of random gunk inside hidden directories. We could debate the details here, but basically, if this were a great idea to do by default, then surely one of cmake/autoconf/... would already do it? Also, my understanding is the main reason pip wants to copy files in the first place is to avoid accidental pollution between different builds using the same local tree; but if a build system implements a global cache like this then surprise, now you can get pollution between arbitrary builds using different trees, or between builds that don't even use a local tree at all (e.g. running 'pip install numpy==1.12.0' can potentially cause a later run of 'pip install numpy==1.12.1' to be corrupted). And, it assumes that all build systems can easily support out-of-tree incremental builds, which is often true but not guaranteed when your wheel build has to wrap some random third party C library's build system.
Make it opt-in
If it's opt-in, then I might as well tell people to run 'pip devinstall .' or 'in-place-install .' or whatever instead, and it'll be much easier all around. But instead of making it opt-in, I'd much rather it Just Work. It's frustrating that at the same time we're moving to the glorious simplified future, we're also picking up a new piece of arcane wisdom that devs will need to be taught, and another place where numerical Python devs will roll their eyes at how the standard Python tooling doesn't care about them. (And I totally understand that the motivation on your end is also to make things Just Work, but I feel like in the specific case where someone is *repeatedly* building out of the *same source directory* – which is the one in dispute here – we should optimize for developer experience.)
and build a hash of the directory into the cache key so different file contents mean different cache objects then. I’m not really sold on the idea that the fact some developers haven’t decided to do it then it is a bad idea. Perhaps those build systems are operating under different constraints than we are (I’m almost certainly sure this is the case).
I think the way the constraints differ is just that they don't have this imposed constraint that they *must* build out of an ephemeral tempdir. If we're operating under that constraint, then your idea is certainly worth considering. My point is that it's much easier to remove that constraint (by switching to using a non-pip tool) than it is to work around it, so that's my prediction for what will happen.
-n

On Jun 3, 2017, at 1:40 AM, Nathaniel Smith njs@pobox.com wrote:
On Fri, Jun 2, 2017 at 8:38 PM, Donald Stufft donald@stufft.io wrote:
On Jun 2, 2017, at 10:14 PM, Nathaniel Smith njs@pobox.com wrote:
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
If someone wants to implement a direct-to-wheel build tool and have it compete with ``pip install .`` they’re more than welcome to. Competition is healthy and at the very worst case it could validate either the idea that direct-to-wheel is important enough that people will gladly overcome the relatively small barrier of having to install another tool and then we have data to indicate maybe we need to rethink things or it could validate the idea that it’s not important enough and leave things as they are.
I went and looked through all 105 pages of pip’s issues (open and closed) and made several searches using any keyword I could think of looking for any issue where someone asked for this. The only times I can find anyone asking for this were you and Ralf Gommers as part of the extended discussion around this set of PEPs and I’ve not been able to find a single other person asking for it or complaining about it.
That's because until now, the message that everyone has received over and over is that the way you install a package from a directory on disk is:
cd directory python setup.py install
and this does incremental builds. (My experience is that even today, most people are surprised to learn that 'pip install' accepts directory paths.)
I’m not sure this is true? I mean I’m sure it’s true that for *some* people that’s the message they get, but we see a reasonable volume of bug reports and the like from people passing a path to pip that it’s not hardly an un(der)used feature in the slightest. We have zero metrics so I suspect there is no way to answer which one is used more than the other though. I think writing either one off as “nobody uses that method” is likely to be the wrong answer.
Pip is generally used widely enough in enough different scenarios that it is unusual for any feature it has (even ones that are completely undocumented and requires diving in the source code!) to not be used by a decent chunk of people. I’m not saying that zero people exist that would want this (indeed, there is at least two!) but that it has obviously not been pressing enough to need it that someone felt the need to open a ticket or ask for it before.
In our glorious PEP 517 future, we have to teach everyone to stop using 'setup.py install' and instead use 'pip install .'. This switch enables a glorious new future of non-distutils-based build systems and fixes a bunch of other brokenness at the same time, hooray, BUT currently switching to 'pip install' also causes a regression for everyone who's used to incremental builds working.
Ralf and I noticed this because we were looking at getting a head start on the glorious future by making 'pip install' mandatory for numpy and scipy. The reason no-one else has noticed is that we're among the few people that have tried using 'pip install' as their standard install-from-working-tree command. But soon there will be more.
One thing I’d note is that as far as I can tell, neither the current copy of the PEP, nor my PR or Thomas’ PR or any of the discussions here have talked about having the interface itself mandate calling build_sdist prior to calling build_wheel. I’m personally fine if we want the interface to allow it (and I assumed we would TBH) and explicitly calling that out.
That means that whether or not you call build_sdist (or some copy files hook) first effectively ends up being an implementation detail of the frontend in question. That would allow pip to build the sdist (or do the copy thing) first but another tool could implement it the other way (and installing a local wheel is so simple that you could pretty easily implement that and just shell out to pip for the rest if you wanted).
That also means that we can adjust our answer to it in the future. If such a tool gets built and a lot of people end up using it and asking for it in pip, we can revisit that decision in a future version of pip. Part of the stand off here is the pip developers view it as a regression if we stop building in isolation and you view it as a regression if incremental/inplace builds are not supported. Both can be true! It’s the opinion of the pip developers who have spoken so far that for us, the risk of our regressions is high enough we don’t currently feel comfortable changing that behavior.
However, what I was able to find was what appears to be the original reason pip started copying the directory to begin with, https://github.com/pypa/pip/issues/178 which was caused by the build system reusing the build directory between two different virtual environments and causing an invalid installation to happen. The ticket is old enough that I can get at specifics it because it was migrated over from bitbucket. However the fact that we *used* to do exactly what you want and it caused exactly one of problem I was worried about seems to suggest to me that pip is absolutely correct in keeping this behavior.
Hmm, it looks to me like that bug is saying that at the time, if you ran 'python setup.py install' *inside the pip source tree*, and then tried to run pip's test suite (possibly via 'setup.py test'), then it broke. I don't think this is related to the behavior of 'pip install .', and I feel like we would know if it were currently true that running 'setup.py install' twice in the same directory produced broken shebang lines. (Again, most people who install from source directories are currently using setup.py install!)
Meh yea, I misread the bug report. Reading 105 pages of reports will do that I guess :P
The source tree copying was originally added in:
https://github.com/pypa/pip/commit/57bd8163e4483b7138342da93f5f6bb8460f0e4a
(which is dated ~2 months before that bug you found, and if I'm reading it right tweaks a code path that previously only worked for 'pip install foo.zip' so it also works for 'pip install foo/'). AFAICT the reason it was written this way is that pip started out with the assumption that it was always going to be downloading and unpacking archives, so the logic went:
- make a temporary directory
- unpack the sdist into this temporary directory
- build from this temporary directory
Then, when it came time to add support for building from directories, the structure of the logic meant that by the time pip got to step (2) and realized that it already had a source directory, it was too late -- it was already committed to using the selected temporary directory. So instead of refactoring all this code, they made the minimal change of implementing the "unpack this sdist into this directory" operation for source directories by using shutil.copytree.
I think this chain of reasoning will feel very familiar to anyone working with the modern pip source 5 years later...
It's absolutely true that there are cases where incremental builds can screw things up, especially when using distutils/setuptools. But I don't think this is why pip does things this way originally :-).
It’s not that I don’t trust the backend, it’s that I believe in putting in systems that make it harder to do the wrong thing than the right thing. As it is now building in place correctly requires the build backend to do extra work to ensure that some file that wouldn’t be included in the sdist doesn’t influence the build in some way. Given that I’m pretty sure literally every build tool in existence for Python currently fails this test, I think that is a pretty reasonable statement to say that it might continue to be a problem into the future.
Copying the files makes that harder to do (but still easier than always going through the sdist). If you want to argue that we should always go through the sdist and we shouldn’t have a copy_files hook, I’m ok with that. I’m only partially in favor of it as a performance trade off because I think it passes a high enough bar that it’s unlikely enough for mistakes to be made (and when they do, they’ll be more obvious).
What do you think of letting build backends opt-in to in-place builds?
I think the PEP already allows the interface to be used for in-place builds, though it could use some language specifying that and making sure it’s explicit that is an option the PEP allows. I think I’m neutral on allowing backends to express a preference for in-place builds, but again I don’t think I would be comfortable listening to that to allow a backend to do an in-place build, at least not by default or at first. Could I see us get to a place where we did allow that? Maybe. I wouldn’t promise anything one way or another (and TBH how pip chooses to implement a PEP isn’t really something that the PEP or distutils-sig gets to choose) but I’m fine with leaving the mechanisms in place to allow that in the future.
Other unresolved issues:
- Donald had some concerns about get_wheel_metadata and they've led to
several suggestions, none of which has made everyone go "oh yeah obviously that's the solution". To me this suggests we should go ahead and drop it from PEP 517 and add it back later if/when the need is more obvious. It's optional anyway, so adding it later doesn't hurt anything.
My main concern is the metadata diverging between the get_wheel_metadata and the building wheel phase. The current PEP solves that in a reasonable enough way (and in a way I can assert against). My other concerns are mostly just little API niggles to make it harder to mess up.
I think this one is important to support because we do not to be able to get at the dependencies, and invoking the entire build chain to do that seems like it will be extraordinarily slow.
It's only slow in the case where (a) there's no wheel (obviously), and (b) after getting the dependencies we decide we don't want to install this sdist after all. I imagine numpy for example won't bother implementing get_wheel_metadata because we provide wheels for all the platforms we support and because we have no dependencies, so it is doubly useless AFAICT. But yeah in other cases it could matter. I'm not opposed to including it in general, just thought this might be a way to help get the minimal PEP 517 out the door.
Yea I don’t think this is really a sticking point, I think it’s mostly just around:
1) Do we include a build_sdist hook? A) Can pip use this as part of the process of going from a VCS to a wheel 2) Do we include a copy_the_files hook? 3) Minor decisions like unpacked vs packed wheels/sdists and cwd vs pass in a path.
- It sounds like there's some real question about how exactly a build
frontend should handle the output from build_wheel; in particular, the PEP should say what happens if there are multiple files deposited into the output dir. My original idea when writing the PEP was that the build frontend would know the name/version of the wheel it was looking for, and so it would ignore any other files found in the output dir, which would be forward compatible with a future PEP allowing build_wheel to drop multiple wheels into the output dir (i.e., old pip's would just ignore them). It's clear from the discussion that this isn't how others were imagining it. Which is fine, I don't think this is a huge problem, but we should nail it down so we're not surprised later.
How do you determine the name/version for ``pip install .`` except by running get_wheel_metadata or build_wheel or build_sdist?
Well, I was imagining that the semantics of 'pip install .' in a multi-wheel world would be to install all the generated wheels :-). But yeah, it's not really well-specified as currently written.
Possibly the simplest solution is to say that build_wheel has to return a string which names the wheel, and then in the future we could add build_wheel2 which is identical but returns a list of strings, and backwards compatibility would be:
def build_wheel2(...): return build_wheel(...)[0]
That seems reasonable to me.
-n
[1] Donald's suggestion of silently caching intermediate files in some global cache dir is unreasonably difficult to implement in a user-friendly way – cache management is Hard, and I frankly I still don't think users will accept individual package's build systems leaving hundreds of megabytes of random gunk inside hidden directories. We could debate the details here, but basically, if this were a great idea to do by default, then surely one of cmake/autoconf/... would already do it? Also, my understanding is the main reason pip wants to copy files in the first place is to avoid accidental pollution between different builds using the same local tree; but if a build system implements a global cache like this then surprise, now you can get pollution between arbitrary builds using different trees, or between builds that don't even use a local tree at all (e.g. running 'pip install numpy==1.12.0' can potentially cause a later run of 'pip install numpy==1.12.1' to be corrupted). And, it assumes that all build systems can easily support out-of-tree incremental builds, which is often true but not guaranteed when your wheel build has to wrap some random third party C library's build system.
Make it opt-in
If it's opt-in, then I might as well tell people to run 'pip devinstall .' or 'in-place-install .' or whatever instead, and it'll be much easier all around. But instead of making it opt-in, I'd much rather it Just Work. It's frustrating that at the same time we're moving to the glorious simplified future, we're also picking up a new piece of arcane wisdom that devs will need to be taught, and another place where numerical Python devs will roll their eyes at how the standard Python tooling doesn't care about them. (And I totally understand that the motivation on your end is also to make things Just Work, but I feel like in the specific case where someone is *repeatedly* building out of the *same source directory* – which is the one in dispute here – we should optimize for developer experience.)
People repeatably build out of the same source directory for lots of different reasons. I just closed a ticket the other day about someone who was trying to do a ``-e .`` in a read only directory because they had it mounted inside of a container or a VM or so and were editing it from another machine and it didn’t work (because -e obviously builds in place).
One more specific example I’m concerned about with regressions here is a case like:
docker run -it ubuntu:latest -v $PWD:/app/ pip install /app/ docker run -it centos:latest -v $PWD:/app/ pip install /app/
I’ve seen people doing similar things in the wild today, and if an in place build directory just uses the normal platform tuples to differentiate build directories, there is a good chance that is going to fail miserably, likely with some sort of god awful linking error or segfault.
— Donald Stufft

On 3 June 2017 at 08:47, Donald Stufft donald@stufft.io wrote:
That also means that we can adjust our answer to it in the future. If such a tool gets built and a lot of people end up using it and asking for it in pip, we can revisit that decision in a future version of pip. Part of the stand off here is the pip developers view it as a regression if we stop building in isolation and you view it as a regression if incremental/inplace builds are not supported. Both can be true! It’s the opinion of the pip developers who have spoken so far that for us, the risk of our regressions is high enough we don’t currently feel comfortable changing that behavior.
In summary (for my own benefit as much as anything):
Currently:
1. pip provides out-of-tree builds for directories 2. the backend (setup.py) provides in-place builds
Under PEP 517:
1. pip provides out-of-tree builds for directories (optimised via the build_sdist and or copy_files hooks) 2. the backend (see below) provides in-place builds
The PEP 517 backend for setup.py builds may be something new, or it may be setup.py plus some "support legacy source trees" code in pip. It largely depends on who wants to write and maintain PEP 517 adapter code for setup.py. That's not clear to me yet. OTOH, numpy retaining setup.py and relying on legacy support for that format will work for some time yet, so there's no rush to move to a new backend.
Future:
1. the numpy developers can make a case for a new pip feature, to do in-place builds, PEP 517 has the build_wheel hook that allows this to be implemented
Did I miss anything? Based on this, in-place builds don't seem like they are a big deal (as long as we can agree that "pip doesn't provide in-place builds" isn't a huge issue that's suddenly appeared).
Paul

On 3 June 2017 at 03:14, Nathaniel Smith njs@pobox.com wrote:
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
I may be misunderstanding you, but it's deeply concerning if you're saying "as a potential backend developer, I'm sitting here listening to the discussion about PEP 517 and I've decided not to raise my concerns but simply to let it be implemented and then ignore it". OTOH, I'm not sure how you plan on ignoring it - are you suggesting that projects like numpy won't support "pip install numpy" except for wheel installs[1]?
One thing that's not clear to me: a crucial use case for sdists is (1) download, (2) unpack, (3) patch the source, possibly adding new files, (4) build and install. (After all, the whole reason we insist on distributing sdists is that open source software should be modifiable by the recipient.) Does flit currently support this, given the reliance on VCS metadata?
That's a reasonable concern, and a reservation I have about flit's sdist support, but as a comment about flit, it probably belongs more on the flit tracker than here.
As a point for PEP 517, I think it's valid though. I'd suggest that the PEP add a few more lines on what constitutes a "source tree", by offering some examples. It seems to me that the two key examples of a "source tree" that the PEP must support are
1. A VCS checkout of a project under development. 2. An unpacked sdist that the user has made edits to before building.
(The second of these being the one we're talking about here).
Paul
[1] By "won't support" consider that in a PEP 517 world, pip issues stating "pip install <numpy source checkout> takes too long" or similar will be passed to the backend developer with the suggestion that they implement the build_sdist or copy_files hook. Saying numpy won't support PEP 517 means that such requests will be denied.

On 3 June 2017 at 09:59, Paul Moore p.f.moore@gmail.com wrote:
On 3 June 2017 at 03:14, Nathaniel Smith njs@pobox.com wrote:
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
I may be misunderstanding you, but it's deeply concerning if you're saying "as a potential backend developer, I'm sitting here listening to the discussion about PEP 517 and I've decided not to raise my concerns but simply to let it be implemented and then ignore it". OTOH, I'm not sure how you plan on ignoring it - are you suggesting that projects like numpy won't support "pip install numpy" except for wheel installs[1]?
One thing that's not clear to me: a crucial use case for sdists is (1) download, (2) unpack, (3) patch the source, possibly adding new files, (4) build and install. (After all, the whole reason we insist on distributing sdists is that open source software should be modifiable by the recipient.) Does flit currently support this, given the reliance on VCS metadata?
That's a reasonable concern, and a reservation I have about flit's sdist support, but as a comment about flit, it probably belongs more on the flit tracker than here.
As a point for PEP 517, I think it's valid though. I'd suggest that the PEP add a few more lines on what constitutes a "source tree", by offering some examples. It seems to me that the two key examples of a "source tree" that the PEP must support are
- A VCS checkout of a project under development.
- An unpacked sdist that the user has made edits to before building.
(The second of these being the one we're talking about here).
Paul
[1] By "won't support" consider that in a PEP 517 world, pip issues stating "pip install <numpy source checkout> takes too long" or similar will be passed to the backend developer with the suggestion that they implement the build_sdist or copy_files hook. Saying numpy won't support PEP 517 means that such requests will be denied.
Apologies, gmail's web interface split the conversation thread so I wrote this before reading through to the end, so some of what I say is out of date given subsequent emails. Bad gmail, sorry :-)
Paul

On Sat, Jun 3, 2017 at 8:59 PM, Paul Moore p.f.moore@gmail.com wrote:
On 3 June 2017 at 03:14, Nathaniel Smith njs@pobox.com wrote:
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
I may be misunderstanding you, but it's deeply concerning if you're saying "as a potential backend developer, I'm sitting here listening to the discussion about PEP 517 and I've decided not to raise my concerns but simply to let it be implemented and then ignore it".
I think you partly misunderstood - "ignore you" should mean "ignore pip" or "ignore the mandatory sdist part of PEP 517" not "ignore all of PEP 517". And concerns have been raised (just rejected as less important than the possibility of more bug reports to pip)?
And I agree with Nathaniel's view in the paragraph above.
OTOH, I'm not sure how you plan on ignoring it - are you suggesting that projects like numpy won't support "pip install numpy" except for wheel installs[1]?
Of course not, that will always be supported. It's just that where the developer/build docs now say "python setup.py ..." we want them to say "pip install . -v" and with sdist generation that won't happen - they will instead say "somenewtool install ." where somenewtool is a utility that does something like: 1. invoke backend directly to build a wheel (using PEP 517 build_wheel interface) 2. install the wheel with pip and probably also 1. invoke backend directly for in-place build 2. make the inplace build visible (may involve telling pip to uninstall the project if it's installed elsewhere, and messing with PYTHONPATH or pip metadata)
Ralf

On 4 June 2017 at 10:39, Ralf Gommers ralf.gommers@gmail.com wrote:
On Sat, Jun 3, 2017 at 8:59 PM, Paul Moore p.f.moore@gmail.com wrote:
On 3 June 2017 at 03:14, Nathaniel Smith njs@pobox.com wrote:
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. [1]
I may be misunderstanding you, but it's deeply concerning if you're saying "as a potential backend developer, I'm sitting here listening to the discussion about PEP 517 and I've decided not to raise my concerns but simply to let it be implemented and then ignore it".
I think you partly misunderstood - "ignore you" should mean "ignore pip" or "ignore the mandatory sdist part of PEP 517" not "ignore all of PEP 517". And concerns have been raised (just rejected as less important than the possibility of more bug reports to pip)?
And I agree with Nathaniel's view in the paragraph above.
OTOH, I'm not sure how you plan on ignoring it - are you suggesting that projects like numpy won't support "pip install numpy" except for wheel installs[1]?
Of course not, that will always be supported. It's just that where the developer/build docs now say "python setup.py ..." we want them to say "pip install . -v" and with sdist generation that won't happen - they will instead say "somenewtool install ."
Allowing software consumption frontends like pip to completely replace setup.py as a *development* tool isn't one of the goals of PEP 517 - while we want to enable the use of both simpler (e.g flit) and more capable (e.g. Scons, meson, waf) alternatives without breaking end user installation processes and other tooling, it's also entirely acceptable for projects that are happy with setuptools/distutils as their build toolchain to keep using it.
The subtitle of the packaging panel at PyCon US 2013 ended up being "'./setup.py install' must die", *not* "./setup.py must die" or "./setup.py sdist must die", and that focus specifically on software consumption wasn't an accident.
Personally, I'm thoroughly opposed to commingling software publishing tasks and software consumption tasks in the same toolchain the way distutils/setuptools do (since you can't optimise for the needs of both software publishers *and* software consumers at the same time outside relatively narrow domains like the Chandler plugin system PJE originally wrote setuptools to handle), so I think the mismatched perspectives here may be due at least in part to my failing to communicate that adequately.
When it appears otherwise, it's because I'm conceding that a "pip publish" front-end to a software publishing tool like twine may be beneficial for discoverability and learning purposes when folks are making the leap from "open source consumer" to "open source publisher", and because "retrieve & unpack sdist, apply downstream patches as needed, build & publish wheel" is a fairly common step in automated build pipelines that inherently blurs the line between software consumption and software publication :)
Cheers, Nick.

On Sat, Jun 3, 2017, at 03:14 AM, Nathaniel Smith wrote:
If the pip devs don't trust build systems in general, but (as suggested by copy_files discussion) are ok with trusting them if they promise to be super trustworthy, alternate proposal:
- add a 'in_place_build_safe = True' hook, which indicates that the
build system has been carefully written so that this will generate the same result as building an sdist and then building that; pip checks for this to decide whether to build in place or to build an sdist first.
I would use this for flit if it becomes part of the spec. I can see the rationale for not trusting build systems from the frontend's point of view, but it does feel like all potential build systems are being subjected to the same constraints we need for distutils/setuptools.
One thing that's not clear to me: a crucial use case for sdists is (1) download, (2) unpack, (3) patch the source, possibly adding new files, (4) build and install. (After all, the whole reason we insist on distributing sdists is that open source software should be modifiable by the recipient.) Does flit currently support this, given the reliance on VCS metadata?
Flit does support that, so long as step 4 never needs to build an sdist. Producing the sdist is the only operation for which flit needs the VCS.
This is why I'm doggedly arguing that building and installing should be possible without invoking any 'sdist' hook. ;-)
Thomas

On 3 June 2017 at 10:45, Thomas Kluyver thomas@kluyver.me.uk wrote:
One thing that's not clear to me: a crucial use case for sdists is (1) download, (2) unpack, (3) patch the source, possibly adding new files, (4) build and install. (After all, the whole reason we insist on distributing sdists is that open source software should be modifiable by the recipient.) Does flit currently support this, given the reliance on VCS metadata?
Flit does support that, so long as step 4 never needs to build an sdist. Producing the sdist is the only operation for which flit needs the VCS.
This is why I'm doggedly arguing that building and installing should be possible without invoking any 'sdist' hook. ;-)
This is getting very off-topic, but what if I wanted to patch the source and then build a sdist to put into my local PyPI index? I presume the answer is that I either have to checkout the original sources from VCS or I have to build only wheels and maintain my source patches some other way. I can think of realistic reasons why neither of these 2 options are practical, but it is of course your prerogative to not support those cases.
Also, I typically have a lot of stuff (working notes, utility scripts, etc) checked into VCS that I don't want to be in the sdist. I don't know if flit has a way to exclude such files - and if it does, why can't it use that method to also allow me to say "exclude everything *except* this list of files" if I want to?
This is basically why I'm persistently unable to see why you won't even consider a fallback for building sdists when the VCS isn't present. Paul

On Sat, Jun 3, 2017, at 10:55 AM, Paul Moore wrote:
This is getting very off-topic, but what if I wanted to patch the source and then build a sdist to put into my local PyPI index? I presume the answer is that I either have to checkout the original sources from VCS or I have to build only wheels and maintain my source patches some other way. I can think of realistic reasons why neither of these 2 options are practical, but it is of course your prerogative to not support those cases.
Another alternative would be to unpack the sdist, carefully modify the files you want, and tar it back up manually. If this is needed a lot, it shouldn't be hard to write tools to help with this.
I'm not going to claim this is a perfect system, but I prefer it to every alternative I've thought of so far. ;-)
Also, I typically have a lot of stuff (working notes, utility scripts, etc) checked into VCS that I don't want to be in the sdist. I don't know if flit has a way to exclude such files - and if it does, why can't it use that method to also allow me to say "exclude everything *except* this list of files" if I want to?
One thing you could do is to have a subfolder containing the package as you want it to be in an sdist, including the flit.ini/pyproject.toml file. When flit makes an sdist, it will only use the folder where that file is found - so that you can e.g. have multiple packages in one repository.
More generally, though, I'd question why you don't want those files to be in an sdist? Why should an sdist be any different to a snapshot of your VCS at release time, including all of your thoughts and tools used in development? Installation will usually use a wheel, so download size shouldn't be a major concern.
I might consider adding an additional way to exclude files from an sdist, but I'll leave it a while to see if a compelling need emerges before adding more complexity.
Thomas

On 3 June 2017 at 20:09, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Sat, Jun 3, 2017, at 10:55 AM, Paul Moore wrote:
This is getting very off-topic, but what if I wanted to patch the source and then build a sdist to put into my local PyPI index? I presume the answer is that I either have to checkout the original sources from VCS or I have to build only wheels and maintain my source patches some other way. I can think of realistic reasons why neither of these 2 options are practical, but it is of course your prerogative to not support those cases.
Another alternative would be to unpack the sdist, carefully modify the files you want, and tar it back up manually. If this is needed a lot, it shouldn't be hard to write tools to help with this.
I'm not going to claim this is a perfect system, but I prefer it to every alternative I've thought of so far. ;-)
This is why I haven't understood your insistence that flit can't support sdist export from an sdist tree: if PKG-INFO is already present, there's nothing to generate, just copy the entire directory again. If any modifications have been made they'll get captured automatically.
I like the idea of backends being able to modify PKG-INFO to indicate that the sdist has been modified since the original export (if they want to do so), so it does make sense to me to leave that check up to the backend, rather than telling frontends to only call the sdist export if PKG-INFO is missing.
Cheers, Nick.

On 3 June 2017 at 11:09, Thomas Kluyver thomas@kluyver.me.uk wrote:
More generally, though, I'd question why you don't want those files to be in an sdist? Why should an sdist be any different to a snapshot of your VCS at release time, including all of your thoughts and tools used in development? Installation will usually use a wheel, so download size shouldn't be a major concern.
Because that's how my project is set up, and because that's how I tend to work. Honestly, I'm not trying to debate what's right or wrong here, just pointing out that flit feels to me pretty opinionated on this matter, and its opinions don't match mine. Sadly, that means that I don't use flit - even though as a tool for building wheels, it's ideal for me. Maybe one day I'll make a PR, or fork flit for my own use - it's so close that doing so seems easier than writing a competing tool.
In the context of *this* discussion, it does mean that your arguments about not wanting to support a general "build a sdist" hook come across to me as "because I don't want to work that way" rather than as objective constraints.
But it's not that important now, as we seem to have come to a compromise on what the PEP requires anyway. Paul

On 3 June 2017 at 19:45, Thomas Kluyver thomas@kluyver.me.uk wrote:
On Sat, Jun 3, 2017, at 03:14 AM, Nathaniel Smith wrote:
If the pip devs don't trust build systems in general, but (as suggested by copy_files discussion) are ok with trusting them if they promise to be super trustworthy, alternate proposal:
- add a 'in_place_build_safe = True' hook, which indicates that the
build system has been carefully written so that this will generate the same result as building an sdist and then building that; pip checks for this to decide whether to build in place or to build an sdist first.
I would use this for flit if it becomes part of the spec. I can see the rationale for not trusting build systems from the frontend's point of view, but it does feel like all potential build systems are being subjected to the same constraints we need for distutils/setuptools.
Think of it in terms of fire doors in a building: those exist not because we expect all buildings to always be on fire, but because if there *is* a fire, we want the building to be set up to limit the ability of smoke and the fire itself to spread.
Breaking up a build pipeline is a similar notion: every stage of the process is a potential location for bugs, but if you clearly delineate the steps, you get better natural debugging support simply based on the ability to identify which step in the pipeline is failing.
If there's only one "build it" hook (e.g. "python setup.py install"), then folks debugging an installation failure are completely dependent on the build backend for their ability to diagnose what's going on.
By contrast, if there's a certain process breakdown that backend developers are obliged to support, then it raises the minimum required level of debuggability, since it allows a frontend to execute the build step-by-step and go:
1. Did the sdist generation or build tree extraction work? Yes, good. 2. Did the wheel metadata generation work? Yes, good. 3. Did the actual wheel build work? Yes, good.
If any one of those steps fails, the frontend (and hence the frontend's users) naturally have more information than if the only command available was "Build a wheel directly from this VCS checkout". That kind of thing is less important for publishers and application integrators (the majority of whom are dealing with dozens or maybe hundreds of projects at most) than it is for platform integrators (who are more likely to be dealing with thousands or tens of thousands of different projects, and hence benefit more from systematic improvements in fault isolation support).
Cheers, Nick.

On May 30, 2017, at 6:34 AM, Nick Coghlan ncoghlan@gmail.com wrote:
Note that I'm also fine with pip as a project saying that it will only ship support for the build-backend interface once the source filtering interface is also defined and implemented.
As an addendum, this seems entirely silly to me as well. The packaging landscape is already littered with unimplemented PEPs that confuse people, adding another one seems like a waste of everyone’s time, and if this one is dependent on another one, it seems silly not to define them together.
— Donald Stufft
participants (15)
-
Brett Cannon
-
C Anthony Risinger
-
Chris Jerdonek
-
Daniel Holth
-
Donald Stufft
-
Fred Drake
-
Glyph
-
Jeremy Stanley
-
Leonardo Rochael Almeida
-
Marius Gedminas
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Ralf Gommers
-
Thomas Kluyver