>> Another point is that tools that you might have in your build pipeline
>> -- like auditwheel -- currently use wheel files as their interchange
>> format, so you might end up having to zip, run auditwheel, unzip for
>> pip, and the pip zips again to cache the wheel…
> How is that different from today? In the hypothetical build_wheel producing a zip file… you produce a zip file, run auditwheel which unzips it, which presumably has to zip it back up again for pip, and then pip unzips it again on every single install.
> If auditwheel doesn’t start to accept unzipped wheels, then nothing changes, if it does then suddenly we skip some round trips through zip/unzip and things get faster for everyone.
> I would strongly prefer auditwheel not have to accept unzipped wheel or generate unzipped wheels, because that just multiples the number of cases that need to be supported, and as you've pointed out many times, more potential paths = more chances for bugs. So if you have auditwheel as the last step in your pipeline, that means that at the end of the build what you have is a zipped wheel. If pip accepts zipped wheels, then we can just hand this over and pip drops it in its cache and unzips it into the final location. If pip requires unpacked wheels, then first the backend has to unzip it, and then pip has to do something with the unpacked directory (either copy it file-by-file, or possibly even zip it up again to cache it).

Unless audit wheel is calling this backend directly, or is trying to implement this API to be called by pip, then it never has to do that. This isn’t really meant to be an end user exposed UX, this is strictly for two tools to talk to each other. Thus auditwheel is free to continue to work as it does today and it can completely ignore this spec by just continuing to expect someone to invoke a command that builds a wheel first.

>> The whole conversation feels a bit like we're falling into the
>> developer trap of "oo there's a thing that might be optimizable
>> therefore we MUST optimize it" without any real assessment of the
>> benefits (I'm as guilty as anyone!). It's not even clear to me that
>> copying a tree twice *is* faster than packing and then unpacking a
>> wheel in general – if your tree consists of lots of small files and
>> you're IO-bound, then the wheel version might well be faster. (E.g. on
>> an underprovisioned virtual server, especially if using spinning media
>> - while of course we're all benchmarking on laptops with fast SSD and
>> everything in cache :-).) And in any case, I'm generally very
>> skeptical of moving away from the well-specified wheel format that
>> already has lots of tooling and consensus around it towards anything
>> ad hoc, when AFAICT no-one has even identified this as an important
>> bottleneck.
> I’ve measured that 50%-75% of the time taken by ``python setup.py bdist_wheel`` + unzipping the resulting wheel can be eliminated for ``pip install ./pip``.
> Sure, but no-one noticed or cared about this until we started talking about unpacked wheels for other reasons, and then we went hunting for benchmarks to justify the idea :-). And even so your benchmark is a bit cherry-picked -- that %age will go down if you include the 'setup.py sdist' / 'setup.py unpacked_sdist' step that you want 'pip install ./pip' to do, and even more so if you test on some system with a less robust IO layer than your fancy developer laptop.

Well generating an unpacked sdist rather than a packed sdist saves roughly 50% of the time there too. I wouldn’t specifically say nobody cared, but rather the nature of things meant nobody was in a position to do anything about it until now. It’s not like any of the tooling provided a way to do it natively, so it wasn’t worth the cost of monkey patching.

