[Numpy-discussion] NEP 38 - Universal SIMD intrinsics

Tue Feb 11 13:02:09 EST 2020

>> I think this doesn't quite answer the question. If I understand correctly, it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing from the  supported AVX512 instructions in master). I think the answer is yes, it needs to be added for other architectures as well.

That adds a lot of overhead to write SIMD based optimizations which can discourage contributors. It’s also an unreasonable expectation that a developer be familiar with SIMD of all the architectures. On top of that the performance implications aren’t clear. Software implementations of hardware instructions might perform worse and might not even produce the same result.

From: NumPy-Discussion <numpy-discussion-bounces+raghuveer.devulapalli=intel.com at python.org> On Behalf Of Ralf Gommers
Sent: Monday, February 10, 2020 9:17 PM
To: Discussion of Numerical Python <numpy-discussion at python.org>
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi <einstein.edison at gmail.com<mailto:einstein.edison at gmail.com>> wrote:
—snip—

> 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if contributors want to leverage a new architecture specific SIMD instruction, will they be expected to add software implementation of this instruction for all other architectures too?

In my opinion, if the instructions are lower, then yes. For example, one cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*.  However, I would not expect one person or team to be an expert in all assemblies, so intrinsics for one architecture can be developed independently of another.

I think this doesn't quite answer the question. If I understand correctly, it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing from the supported AVX512 instructions in master). I think the answer is yes, it needs to be added for other architectures as well. Otherwise, if universal intrinsics are added ad-hoc and there's no guarantee that a universal instruction is available for all main supported platforms, then over time there won't be much that's "universal" about the framework.

This is a different question though from adding a new ufunc implementation. I would expect accelerating ufuncs via intrinsics that are already supported to be much more common than having to add new intrinsics. Does that sound right?

> 2) On whom does the burden lie to ensure that new implementations are benchmarked and shows benefits on every architecture? What happens if optimizing an Ufunc leads to improving performance on one architecture and worsens performance on another?

This is slightly hard to provide a recipe for. I suspect it may take a while before this becomes an issue, since we don't have much SIMD code to begin with. So adding new code with benchmarks will likely show improvements on all architectures (we should ensure benchmarks can be run via CI, otherwise it's too onerous). And if not and it's not easily fixable, the problematic platform could be skipped so performance there is unchanged.

Only once there's existing universal intrinsics and then they're tweaked will we have to be much more careful I'd think.

Cheers,
Ralf

I would look at this from a maintainability point of view. If we are increasing the code size by 20% for a certain ufunc, there must be a domonstrable 20% increase in performance on any CPU. That is to say, micro-optimisation will be unwelcome, and code readability will be preferable. Usually we ask the submitter of the PR to test the PR with a machine they have on hand, and I would be inclined to keep this trend of self-reporting. Of course, if someone else came along and reported a performance regression of, say, 10%, then we have increased code by 20%, with only a net 5% gain in performance, and the PR will have to be reverted.

—snip—
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org<mailto:NumPy-Discussion at python.org>
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200211/5b26b7b5/attachment.html>