how much does binary size matter?
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, In https://github.com/numpy/numpy/pull/13207 a discussion started about the tradeoff between performance gain for one function vs increasing the size of a NumPy build by a couple of percent. We also discussed that in the community call on Wednesday and concluded that it may be useful to ask here for some more feedback. Beyond disk/memory usage and download bandwidth, are there specific problems that people are struggling with regarding size of numpy binaries? Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone? Thanks, Ralf
![](https://secure.gravatar.com/avatar/6259f06278de6b3e0dc3639c3d67c5c6.jpg?s=120&d=mm&r=g)
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit : Hi Ralf,
Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd even say in most of the countries around the world), 16 Mb is a significant amount of data so increasing it is a burden. Cheers, Éric.
Thanks, Ralf
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
![](https://secure.gravatar.com/avatar/81e62cb212edf2a8402c842b120d9f31.jpg?s=120&d=mm&r=g)
here is a baseline https://en.wikipedia.org/wiki/List_of_countries_by_Internet_connection_speed... . Probably a good idea to throttle values at 60% of the bandwidth and you get a crude average delay it would cause per 1MB worldwide. On Fri, Apr 26, 2019 at 11:49 AM Éric Depagne <eric@depagne.org> wrote:
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit : Hi Ralf,
Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd even say in most of the countries around the world), 16 Mb is a significant amount of data so increasing it is a burden.
Cheers, Éric.
Thanks, Ralf
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/6259f06278de6b3e0dc3639c3d67c5c6.jpg?s=120&d=mm&r=g)
Le vendredi 26 avril 2019, 12:49:39 SAST Ilhan Polat a écrit : Hi Ihlan, That's an interesting link, but they provide the average, which is not a very good indicator. I have myself a 100 Mb/s link where I live, which means that as Akamai ranks my country with an average speed of 6.7 Mb/s, a lot of person have a connection that does not reach 1 Mb/s. Of course, many of those will not be interested in downloading numpy, so that might not be an issue. Éric.
here is a baseline https://en.wikipedia.org/wiki/List_of_countries_by_Internet_connection_speed s . Probably a good idea to throttle values at 60% of the bandwidth and you get a crude average delay it would cause per 1MB worldwide.
On Fri, Apr 26, 2019 at 11:49 AM Éric Depagne <eric@depagne.org> wrote:
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit : Hi Ralf,
Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd even say in most of the countries around the world), 16 Mb is a significant amount of data so increasing it is a burden.
Cheers, Éric.
Thanks, Ralf
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, Obviously this is a trade-off; if we can increase binary size we can add more optimizations, for more platforms, and development will be a little faster, because we are not having to spend time optimizing for binary size. If people on slow internet connections had to download numpy multiple times, I guess this would be an issue, but I've lived behind excruciatingly slow and unreliable connections, in Cuba, and, for a download you do a few times a year, I very much doubt there would be practical difference between 16M and say, 20M. If it's 16M vs 50M, then I think it's worth having the discussion, with the relevant trade-offs. Cheers, Matthew On Fri, Apr 26, 2019 at 1:30 PM Éric Depagne <eric@depagne.org> wrote:
Le vendredi 26 avril 2019, 12:49:39 SAST Ilhan Polat a écrit : Hi Ihlan,
That's an interesting link, but they provide the average, which is not a very good indicator. I have myself a 100 Mb/s link where I live, which means that as Akamai ranks my country with an average speed of 6.7 Mb/s, a lot of person have a connection that does not reach 1 Mb/s. Of course, many of those will not be interested in downloading numpy, so that might not be an issue.
Éric.
here is a baseline https://en.wikipedia.org/wiki/List_of_countries_by_Internet_connection_speed s . Probably a good idea to throttle values at 60% of the bandwidth and you get a crude average delay it would cause per 1MB worldwide.
On Fri, Apr 26, 2019 at 11:49 AM Éric Depagne <eric@depagne.org> wrote:
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit : Hi Ralf,
Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd even say in most of the countries around the world), 16 Mb is a significant amount of data so increasing it is a burden.
Cheers, Éric.
Thanks, Ralf
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/c0da24f75f763b6bac90b519064f30b3.jpg?s=120&d=mm&r=g)
Hi, We understand that it can be burden, of course a larger binary is bad but that bad usually also comes with good, like better performance or more features. How much of a burden is it and where is the line between I need twice as long to download it which is just annoying and I cannot use it anymore because for example it does not fit onto my device anymore. Are there actual environments or do you know of any environments where the size of the numpy binary has an impact on whether it can be deployed or not or where it is more preferable for numpy to be small than it is to be fast or full of features. This is interesting to us just to judge on how to handle marginal improvements which come with relatively large increases in binary size. With some use case information we can better estimate were it is worthwhile to think about alternatives or to spend more benchmarking work to determine the most important cases and where not. If there are such environments there are other options than blocking or complicating future enhancements, like for example add a compile time option to make it smaller again by e.g. stripping out hardware specific code or avoiding size expensive optimizations. But without concrete usecases this appears to be a not something worth spending time on. On 26.04.19 11:47, Éric Depagne wrote:
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit : Hi Ralf,
Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd even say in most of the countries around the world), 16 Mb is a significant amount of data so increasing it is a burden.
![](https://secure.gravatar.com/avatar/6259f06278de6b3e0dc3639c3d67c5c6.jpg?s=120&d=mm&r=g)
Le vendredi 26 avril 2019, 21:13:22 SAST Julian Taylor a écrit : Hi all, It seems that my message was misinterpreted, so let me clarify a few things. I'm not saying that increasing the size of the binary is a bad thing, specially if there are lots of improvements that caused this increase. My message was just a note to be sure that bandwidth availability is not forgotten, as it's fairly easy (I'm guilty of that myself) to take for granted that downloads will always be fast and hassle free. Concerning the environments where it matters, I currently live in South Africa, and even if things are improving fast in terms of bandwidth availability, there is still a long way for people to get fast access at their houses for a fee that is accessible. So I'd say that environments where the size of binaries has no impact are the clear minority here. That said, as I've raised the issue I wanted, and you are aware of it, I do not see a reason to increase the size of this thread. Cheers, Éric.
Hi, We understand that it can be burden, of course a larger binary is bad but that bad usually also comes with good, like better performance or more features.
How much of a burden is it and where is the line between I need twice as long to download it which is just annoying and I cannot use it anymore because for example it does not fit onto my device anymore.
Are there actual environments or do you know of any environments where the size of the numpy binary has an impact on whether it can be deployed or not or where it is more preferable for numpy to be small than it is to be fast or full of features.
This is interesting to us just to judge on how to handle marginal improvements which come with relatively large increases in binary size. With some use case information we can better estimate were it is worthwhile to think about alternatives or to spend more benchmarking work to determine the most important cases and where not.
If there are such environments there are other options than blocking or complicating future enhancements, like for example add a compile time option to make it smaller again by e.g. stripping out hardware specific code or avoiding size expensive optimizations. But without concrete usecases this appears to be a not something worth spending time on.
On 26.04.19 11:47, Éric Depagne wrote:
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit : Hi Ralf,
Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd even say in most of the countries around the world), 16 Mb is a significant amount of data so increasing it is a burden.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Apr 27, 2019 at 8:04 PM Éric Depagne <eric@depagne.org> wrote:
Le vendredi 26 avril 2019, 21:13:22 SAST Julian Taylor a écrit : Hi all,
It seems that my message was misinterpreted, so let me clarify a few things.
I'm not saying that increasing the size of the binary is a bad thing, specially if there are lots of improvements that caused this increase.
My message was just a note to be sure that bandwidth availability is not forgotten, as it's fairly easy (I'm guilty of that myself) to take for granted that downloads will always be fast and hassle free.
Thanks Eric, your point is clear and we definitely won't forget to consider users on older hardware or behind slow connections.
Concerning the environments where it matters, I currently live in South Africa, and even if things are improving fast in terms of bandwidth availability, there is still a long way for people to get fast access at their houses for a fee that is accessible. So I'd say that environments where the size of binaries has no impact are the clear minority here.
That said, as I've raised the issue I wanted, and you are aware of it, I do not see a reason to increase the size of this thread.
Cheers, Éric.
Hi, We understand that it can be burden, of course a larger binary is bad but that bad usually also comes with good, like better performance or more features.
How much of a burden is it and where is the line between I need twice as long to download it which is just annoying and I cannot use it anymore because for example it does not fit onto my device anymore.
Are there actual environments or do you know of any environments where the size of the numpy binary has an impact on whether it can be deployed or not or where it is more preferable for numpy to be small than it is to be fast or full of features.
Here is my take on it: The case of this PR is borderline. If we would write down a hard criterion, this likely would not meet it. Rationale: if we get 100 PRs like this, the average performance of numpy for a user would not change all that much, however we have by then blown up the size NumPy takes up (disk/RAM/download/etc) by a factor 2.4. *However*, we won't get 100 PRs like this. So judging this based on such a criterion isn't quite right. We have this PR now, and it's good to go. Presumably it helps @qwhelan significantly. So I'm +0.5 for merging it. Also note that Cython has the same problem: taking one function and putting it in a .pyx file gives a huge amount of bloat (example: `scipy.ndimage.label`). We had the same discussion there, but it never became a practical issue because there were not many other PRs like that. tl;dr let's merge this, and let's try not to make these kinds of changes a habit Cheers, Ralf
This is interesting to us just to judge on how to handle marginal improvements which come with relatively large increases in binary size. With some use case information we can better estimate were it is worthwhile to think about alternatives or to spend more benchmarking work to determine the most important cases and where not.
If there are such environments there are other options than blocking or complicating future enhancements, like for example add a compile time option to make it smaller again by e.g. stripping out hardware specific code or avoiding size expensive optimizations. But without concrete usecases this appears to be a not something worth spending time on.
On 26.04.19 11:47, Éric Depagne wrote:
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit : Hi Ralf,
Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd even say in most of the countries around the world), 16 Mb is a significant amount of data so increasing it is a burden.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
-- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (5)
-
Ilhan Polat
-
Julian Taylor
-
Matthew Brett
-
Ralf Gommers
-
Éric Depagne