[Distutils] PEP 517: Open questions around artifact export directories

Nathaniel Smith njs at pobox.com
Mon Jun 12 19:18:18 EDT 2017


On Mon, Jun 12, 2017 at 3:49 PM, Donald Stufft <donald at stufft.io> wrote:

>
> On Jun 12, 2017, at 6:36 PM, Nathaniel Smith <njs at pobox.com> wrote
>
> Another point is that tools that you might have in your build pipeline
> -- like auditwheel -- currently use wheel files as their interchange
> format, so you might end up having to zip, run auditwheel, unzip for
> pip, and the pip zips again to cache the wheel…
>
>
> How is that different from today? In the hypothetical build_wheel
> producing a zip file… you produce a zip file, run auditwheel which unzips
> it, which presumably has to zip it back up again for pip, and then pip
> unzips it again on every single install.
>
> If auditwheel doesn’t start to accept unzipped wheels, then nothing
> changes, if it does then suddenly we skip some round trips through
> zip/unzip and things get faster for everyone.
>

I would strongly prefer auditwheel not have to accept unzipped wheel or
generate unzipped wheels, because that just multiples the number of cases
that need to be supported, and as you've pointed out many times, more
potential paths = more chances for bugs. So if you have auditwheel as the
last step in your pipeline, that means that at the end of the build what
you have is a zipped wheel. If pip accepts zipped wheels, then we can just
hand this over and pip drops it in its cache and unzips it into the final
location. If pip requires unpacked wheels, then first the backend has to
unzip it, and then pip has to do something with the unpacked directory
(either copy it file-by-file, or possibly even zip it up again to cache it).



>
>
> The whole conversation feels a bit like we're falling into the
> developer trap of "oo there's a thing that might be optimizable
> therefore we MUST optimize it" without any real assessment of the
> benefits (I'm as guilty as anyone!). It's not even clear to me that
> copying a tree twice *is* faster than packing and then unpacking a
> wheel in general – if your tree consists of lots of small files and
> you're IO-bound, then the wheel version might well be faster. (E.g. on
> an underprovisioned virtual server, especially if using spinning media
> - while of course we're all benchmarking on laptops with fast SSD and
> everything in cache :-).) And in any case, I'm generally very
> skeptical of moving away from the well-specified wheel format that
> already has lots of tooling and consensus around it towards anything
> ad hoc, when AFAICT no-one has even identified this as an important
> bottleneck.
>
>
> I’ve measured that 50%-75% of the time taken by ``python setup.py
> bdist_wheel`` + unzipping the resulting wheel can be eliminated for ``pip
> install ./pip``.
>

Sure, but no-one noticed or cared about this until we started talking about
unpacked wheels for other reasons, and then we went hunting for benchmarks
to justify the idea :-). And even so your benchmark is a bit cherry-picked
-- that %age will go down if you include the 'setup.py sdist' / 'setup.py
unpacked_sdist' step that you want 'pip install ./pip' to do, and even more
so if you test on some system with a less robust IO layer than your fancy
developer laptop.

(I heard a rumor recently that the reason Travis-CI's MacOS builds are so
terribly behind all the time is that their hosting provider has plenty of
CPU but their SAN is at its absolute limit in terms of IOPS, so they can't
add any more capacity. Packed wheels are much friendlier than unpacked ones
when it comes to IOPS...)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170612/9d64d995/attachment.html>


More information about the Distutils-SIG mailing list