On Tue, Sep 4, 2018, 07:42 Nick Coghlan <ncoghlan@gmail.com> wrote:

On Tue, 4 Sep 2018 at 20:30, Nathaniel Smith <njs@pobox.com> wrote:
>
> On Mon, Sep 3, 2018 at 4:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
> > On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com>
> > wrote:
> >>
> >>
> >> What’s the problem with including GPU and non-GPU variants of code in a
> >> binary wheel other than the size of the wheel? I tend to prefer binaries
> >> that work “everywhere", even if that requires some more work in building
> >> binaries (such as including multiple variants of extensions to have
> >> optimised code for different CPU variants, such as SSE and non-SSE variants
> >> in the past).
> >
> >
> > As far as I'm aware, binary artifact size *is* the problem. It's just that
> > once you're automatically building and pushing an artifact (or an image
> > containing that artifact) to thousands or tens of thousands of managed
> > systems, the wasted bandwidth from pushing redundant implementations of the
> > same functionality becomes more of a concern than the convenience of being
> > able to use the same artifact across multiple platforms.
>
> None of the links that Dustin gave at the top of the thread are about
> managed systems though.

When you're only managing a few systems, or only saving a few MB per
download, "install both and pick at runtime" is an entirely viable
option.

Sure, this is true, and obviously size is a major reason for splitting up these packages, but this doesn't have anything in particular to do with managed systems AFAICT.

However, since tensorflow is the example, neither of those cases is true:

1. It's a Google project, so they have tens of thousands of instances
to worry about (as do other cloud providers)

They do have those instances, but they handle them via totally different methods that don't involve PyPI package names or pip's dependency tracking. (Specifically, a giant internal monorepo where they check in every piece of code they use, and then they build everything from source through their internal version of Bazel.)

This is about how they, and other projects, are distributed to the general public on PyPI, and how to manage that public, shared dependency graph.

-n