On Wed, Feb 12, 2020 at 12:19 AM Matti Picus
On 11/2/20 8:02 pm, Devulapalli, Raghuveer wrote:
On top of that the performance implications aren’t clear. Software implementations of hardware instructions might perform worse and might not even produce the same result.
The proposal for universal intrinsics does not enable replacing an intrinsic on one platform with a software emulation on another: the intrinsics are meant to be compile-time defines that overlay the universal intrinsic with a platform specific one. In order to use a new intrinsic, it must have parallel intrinsics on the other platforms, or cannot be used there: "NPY_CPU_HAVE(FEATURE_NAME)" will always return false so the compiler will not even build a loop for that platform. I will try to clarify that intention in the NEP.
I hope there will not be a demand to use many non-universal intrinsics in ufuncs, we will need to work this out on a case-by-case basis in each ufunc. Does that sound reasonable? Are there intrinsics you have already used that have no parallel on other platforms?
Intrinsics are not an irreversible change, they are, after all, private. The question is whether they are sufficiently useful to justify the time spent on them. I don't think we will know that until we attempt actual implementations. There will probably be some changes as a result of experience, but that is normal. Chuck