Mailman 3 Re: Precision changes to sin/cos in the next release? - NumPy-Discussion

newer
ANN: SciPy 1.11.0rc1 -- please test

Re: Precision changes to sin/cos in the next release?

older
Re: Precision changes to sin/cos...

Devulapalli, Raghuveer

May 31, 2023

12:34 p.m.

I wouldn't the discount the performance impact on real world benchmarks for these functions. Just to name a couple of examples: * 7x speed up of np.exp and np.log results in a 2x speed up of training neural networks like logistic regression [1]. I would expect np.tanh will show similar results for neural networks. * Vectorizing even simple functions like np.maximum results in a 1.3x speed up of sklearn's Kmeans algorithm [2] Raghuveer [1] https://github.com/numpy/numpy/pull/13134 [2] https://github.com/numpy/numpy/pull/14867

Attachments:

attachment.html (text/html — 5.9 KB)

Show replies by date

Andrew Nelson

May 2023

12:46 p.m.

New subject: Precision changes to sin/cos in the next release?

What is the effect of these changes for transcendental functions in the complex plane?

Robert Kern

4:36 p.m.

New subject: Precision changes to sin/cos in the next release?

On Wed, May 31, 2023 at 12:37 PM Devulapalli, Raghuveer < raghuveer.devulapalli@intel.com> wrote:

...

Perfect, those are precisely the concrete use cases I would want to see so we can talk about the actual ramifications of the changes. These particular examples suggest to me that a module or package providing fast-inaccurate functions would be a good idea, but not across-the-board fast-inaccurate implementations (though it's worth noting that the exp/log/maximum replacements that you cite don't seem to be particularly inaccurate). The performance improvements show up in situational use cases. Logistic regression is not really a neural network (unless if you squint real hard) so the loss function does take a significant amount of whole function performance; the activation and loss functions of real neural networks take up a rather small amount of time compared to the matmuls. Nonetheless, people do optimize activation functions, but often by avoiding special functions entirely with ReLUs (which have other benefits in terms of nice gradients). Not sure anyone really uses tanh for serious work. ML is a perfect use case for *opt-in* fast-inaccurate implementations. The whole endeavor is to replace complicated computing logic with a smaller number of primitives that you can optimize the hell out of, and let the model size and training data size handle the complications. And a few careful choices by people implementing the marquee packages can have a large effect. In the case of transcendental activation functions in NNs, if you really want to optimize them, it's a good idea to trade *a lot* of accuracy (measured in %, not ULPs) for performance, in addition to doing it on GPUs. And that makes changes to the `np.*` implementations mostly irrelevant for them, and you can get that performance without making anyone else pay for it. Does anyone have compelling concrete use cases for accelerated trig functions, per se, rather than exp/log and friends? I'm more on board with accelerating those than trig functions because of their role in ML and statistics (I'd still *prefer* to opt in, though). They don't have many special values (which usually have alternates like expm1 and log1p to get better precision in any case). But for trig functions, I'm much more likely to be doing geometry where I'm with Archimedes: do not disturb my circles! -- Robert Kern

David Menéndez Hurtado

4:57 p.m.

New subject: Precision changes to sin/cos in the next release?

On Wed, 31 May 2023, 22:41 Robert Kern, <robert.kern@gmail.com> wrote:

...

Not sure anyone really uses tanh for serious work.

At the risk of derailing the discussion, the case I can think of (but kind of niche) is using neural networks to approximate differential equations. Then you need non linearities in the gradients everywhere. I have also experimented with porting a few small networks and other ML models to numpy by hand to make it easier to deploy. But then, performance in my use case wasn't crítical.

...