[Distutils] Provisionally accepting PEP 517's declarative build system interface

Donald Stufft donald at stufft.io
Mon May 29 15:50:28 EDT 2017


> On May 29, 2017, at 3:09 PM, Nathaniel Smith <njs at pobox.com> wrote:
> 
> On Mon, May 29, 2017 at 7:26 AM, Donald Stufft <donald at stufft.io <mailto:donald at stufft.io>> wrote:
>> 
>> On May 29, 2017, at 3:05 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> 
>>> I think there's some pip bug somewhere discussing this, where Ralf
>>> Gommers and I point out that this is a complete showstopper for
>>> projects with complex and expensive builds (like scipy). If 'pip
>>> install .' is going to replace 'setup.py install', then it needs to
>>> support incremental builds, and the way
>>> setup.py-and-almost-every-other-build-tool do this currently is by
>>> reusing the working directory across builds.
>> 
>> 
>> 
>> Wouldn’t supporting incremental builds the way ccache does work just fine?
>> Have a per build tool cache directory somewhere that stores cached build
>> output for each individual file keyed off a hash or something? (For that
>> matter, if someone wants incremental rebuilds, couldn’t they just *use*
>> ccache as their CC?).
> 
> With a random numpy checkout on my laptop and a fully-primed ccache,
> some wall-clock timings:
> 
> no-op incremental build (python setup.py build): 1.186 seconds
> 
> python setup.py sdist: 3.213 seconds
> unpack resulting tarball: 0.136 seconds
> python setup.py build in unpacked tree: 7.696 seconds
> 
> So ccache makes the sdist-and-build a mere 10x slower than an in-place
> incremental build.
> 
> ccache is great, but it isn't magic. It can't make copying files
> faster (notice we're already 3x slower before we even start
> building!), it doesn't speed up linking, and you still need to spawn
> all those processes and hash all that source code instead of just
> making some stat() calls.
> 
> Also, this is on Linux. The numbers would look much worse on Windows,
> given that it generally has much higher overhead for unpacking
> tarballs and spawning lots of processes, and also given that ccache
> doesn't support MSVC!


To be honest, I’m not hardly going to feel particularly bad if one of the most compilation heavy packages that exist takes a whole 10 seconds to install from a VCS checkout. Particularly when I assume that the build tool can be even smarter here than ccache is able to be to reduce the setup.py build step back down to the no-op incremental build case.

I mean, unless numpy is doing something different, the default distutils incremental build stuff is incredibly dumb, it just stores the build output in a directory (by default it’s located in ./build/) and compares the mtime of a list of source files with the mtime of the target file, and if the sources files are newer, it recompiles it. If you replace mtime with blake2 (or similar) then you can trivially support the exact same thing just storing the built target files in some user directory cache instead. Hell, we *might* even be able to preserve mtime (if we’re not already… we might be! But I’d need to dig into it) so literally the only thing that would need to change is instead of storing the built artifacts in ./build/ you store them in ~/.cache/my-cool-build-tool/{project-name}. Bonus points: this means you get incremental speeds even when building from a sdist from PyPI that doesn’t have wheels and hasn’t changed those files either.

I’m of the opinion that first you need to make it *correct*, then you can try to make it *fast*. It is my opinion that a installer that shits random debris into your current directory is not correct. It’s kind of silly that we have to have a “random pip/distutils/setuptools” crap chunk of stuff to add to .gitignore to basically every Python package in existence. Nevermind the random stuff that doesn’t currently get written there, but will if we stop copying files out of the path and into a temporary location (I’m sure everyone wants a pip-egg-info directory in their current directory).

I’m also of the opinion that avoiding foot guns is more important than shooting for the fastest operation possible. I regularly (sometimes multiple times a week!, but often every week or two) see people tripping up on the fact that ``git clone … && pip install .`` does something different than ``git clone … && python setup.py sdist && pip install dist/*``. Files suddenly go missing and they have no idea why. If they’re lucky, they’ll figure out they need to modify some combination of package_data, data_files, and MANIFEST.in to make it work, if they’re not lucky they just sit there dumbfounded at it.


> 
> Also also, notice elsewhere in the thread where Thomas notes that flit
> can't build an sdist from an unpacked sdist. It seems like 'pip
> install unpacked-sdist/' is an important use case to support…
> 

If the build tool gives us a mechanism to determine if something is an unpacked sdist or not so we can fallback to just copying in that case, that is fine with me. The bad case is generally only going to be hit on VCS checkouts or other not sdist kinds of source trees. 


—
Donald Stufft



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170529/c36f5617/attachment-0001.html>


More information about the Distutils-SIG mailing list