[Numpy-discussion] how much does binary size matter?

Ralf Gommers ralf.gommers at gmail.com
Tue Apr 30 05:10:19 EDT 2019

On Sat, Apr 27, 2019 at 8:04 PM Éric Depagne <eric at depagne.org> wrote:

> Le vendredi 26 avril 2019, 21:13:22 SAST Julian Taylor a écrit :
> Hi all,
> It seems that my message was misinterpreted, so let me clarify a few
> things.
> I'm not saying that increasing the size of the binary is a bad thing,
> specially if there are lots of improvements that caused this increase.
> My message was just a note to be sure that bandwidth availability is not
> forgotten, as it's fairly easy (I'm guilty of that myself) to take for
> granted
> that downloads will always be fast and hassle free.

Thanks Eric, your point is clear and we definitely won't forget to consider
users on older hardware or behind slow connections.

> Concerning the environments where it matters, I currently live in South
> Africa,
> and even if things are improving fast in terms of bandwidth availability,
> there is still a long way for people to get fast access at their houses
> for a
> fee that is accessible. So I'd say that environments where the size of
> binaries has no impact are the clear minority here.
> That said, as I've raised the issue I wanted, and you are aware of it, I
> do not
> see a reason to increase the size of this thread.
> Cheers,
> Éric.
> > Hi,
> > We understand that it can be burden, of course a larger binary is bad
> > but that bad usually also comes with good, like better performance or
> > more features.
> >
> > How much of a burden is it and where is the line between I need twice as
> > long to download it which is just annoying and I cannot use it anymore
> > because for example it does not fit onto my device anymore.
> >
> > Are there actual environments or do you know of any environments where
> > the size of the numpy binary has an impact on whether it can be deployed
> > or not or where it is more preferable for numpy to be small than it is
> > to be fast or full of features.

Here is my take on it:
The case of this PR is borderline. If we would write down a hard criterion,
this likely would not meet it. Rationale: if we get 100 PRs like this, the
average performance of numpy for a user would not change all that much,
however we have by then blown up the size NumPy takes up
(disk/RAM/download/etc) by a factor 2.4. *However*, we won't get 100 PRs
like this. So judging this based on such a criterion isn't quite right. We
have this PR now, and it's good to go. Presumably it helps @qwhelan
significantly. So I'm +0.5 for merging it.

Also note that Cython has the same problem: taking one function and putting
it in a .pyx file gives a huge amount of bloat (example:
`scipy.ndimage.label`). We had the same discussion there, but it never
became a practical issue because there were not many other PRs like that.

tl;dr let's merge this, and let's try not to make these kinds of changes a


> > This is interesting to us just to judge on how to handle marginal
> > improvements which come with relatively large increases in binary size.
> > With some use case information we can better estimate were it is
> > worthwhile to think about alternatives or to spend more benchmarking
> > work to determine the most important cases and where not.
> >
> > If there are such environments there are other options than blocking or
> > complicating future enhancements, like for example add a compile time
> > option to make it smaller again by e.g. stripping out hardware specific
> > code or avoiding size expensive optimizations.
> > But without concrete usecases this appears to be a not something worth
> > spending time on.
> >
> > On 26.04.19 11:47, Éric Depagne wrote:
> > > Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
> > > Hi Ralf,
> > >
> > >> Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are
> we
> > >> causing a real problem for someone?
> > >
> > > Access to large bandwidth is not universal at all, and in many
> countries
> > > (I'd even say in most of the countries around the world), 16 Mb is a
> > > significant amount of data so increasing it is a burden.
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> --
> Un clavier azerty en vaut deux
> ----------------------------------------------------------
> Éric Depagne
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190430/e95ef7a9/attachment-0001.html>

More information about the NumPy-Discussion mailing list