On 11/2/20 8:02 pm, Devulapalli, Raghuveer wrote:
>
> On top of that the performance implications aren’t clear. Software
> implementations of hardware instructions might perform worse and might
> not even produce the same result.
>
The proposal for universal intrinsics does not enable replacing an
intrinsic on one platform with a software emulation on another: the
intrinsics are meant to be compile-time defines that overlay the
universal intrinsic with a platform specific one. In order to use a new
intrinsic, it must have parallel intrinsics on the other platforms, or
cannot be used there: "NPY_CPU_HAVE(FEATURE_NAME)" will always return
false so the compiler will not even build a loop for that platform. I
will try to clarify that intention in the NEP.
I hope there will not be a demand to use many non-universal intrinsics
in ufuncs, we will need to work this out on a case-by-case basis in each
ufunc. Does that sound reasonable? Are there intrinsics you have already
used that have no parallel on other platforms?
Intrinsics are not an irreversible change, they are, after all, private. The question is whether they are sufficiently useful to justify the time spent on them. I don't think we will know that until we attempt actual implementations. There will probably be some changes as a result of experience, but that is normal.
Chuck