On 4 Sep 2018, at 01:51, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com> wrote:

What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).

As far as I'm aware, binary artifact size *is* the problem. It's just that once you're  automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms.

Ok. I’m more used to much smaller deployments where I don’t always know up front what the capabilities are of the system that the code will run on.  

And looking at tensorflow specifically the difference in size is very much significant, the GPU variant is 5 times as large as the non-GPU variant (55MB vs 255MB). That’s a good reason for not wanting to unconditionally ship both variants.

Ronald