Should we revert float16 loops (and what are the precision requirements)?
Hi all, In the NumPy 2.4 cycle, there were some native float16 implementations merged with rather low precision leading to the following issue: https://github.com/numpy/numpy/issues/30821 That is, previously, it used float loops so ~0.5 ULP error, now is is 2+ULP for many algorithms, on _some_ hardware: https://github.com/numpy/numpy/pull/23351 There is always an argument around that users of float16 probably don't care about many ULP, but I guess they also have very few bits of precision to begin with? I don't have a huge opinion on it, but we are more and more in the position where it is unclear if sacrificing a bit of precision is the right thing or not... Similar questions actually arise for float32 math, is it OK to trade- off precision for performance (or to what degree, everything trades a bit)? We have had discussions around this before but it is still a difficult trade-off to make and there is no choice that makes everyone happy. [1] - Sebastian [1] We can work towards something like `np.opts(precision="low")` or so, but that doesn't change the question of defaults much...
On Tue, Mar 10, 2026 at 1:28 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
In the NumPy 2.4 cycle, there were some native float16 implementations merged with rather low precision leading to the following issue: https://github.com/numpy/numpy/issues/30821
That is, previously, it used float loops so ~0.5 ULP error, now is is 2+ULP for many algorithms, on _some_ hardware: https://github.com/numpy/numpy/pull/23351
There is always an argument around that users of float16 probably don't care about many ULP, but I guess they also have very few bits of precision to begin with? I don't have a huge opinion on it, but we are more and more in the position where it is unclear if sacrificing a bit of precision is the right thing or not...
Similar questions actually arise for float32 math, is it OK to trade- off precision for performance (or to what degree, everything trades a bit)? We have had discussions around this before but it is still a difficult trade-off to make and there is no choice that makes everyone happy. [1]
- Sebastian
[1] We can work towards something like `np.opts(precision="low")` or so, but that doesn't change the question of defaults much...
I do like the idea of having a precise/fast toggle. Until we can develop one, I think we should prefer precise. So we should revert and document somewhere that float16 (and the soon-to-be-incoming bfloat16) are, in NumPy, container types, and that all the math for them is done as float16. Matti
On Wed, Mar 11, 2026 at 10:58 AM matti picus via NumPy-Discussion < numpy-discussion@python.org> wrote:
On Tue, Mar 10, 2026 at 1:28 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
In the NumPy 2.4 cycle, there were some native float16 implementations merged with rather low precision leading to the following issue: https://github.com/numpy/numpy/issues/30821
That is, previously, it used float loops so ~0.5 ULP error, now is is 2+ULP for many algorithms, on _some_ hardware: https://github.com/numpy/numpy/pull/23351
There is always an argument around that users of float16 probably don't care about many ULP, but I guess they also have very few bits of precision to begin with? I don't have a huge opinion on it, but we are more and more in the position where it is unclear if sacrificing a bit of precision is the right thing or not...
Similar questions actually arise for float32 math, is it OK to trade- off precision for performance (or to what degree, everything trades a bit)? We have had discussions around this before but it is still a difficult trade-off to make and there is no choice that makes everyone happy. [1]
- Sebastian
[1] We can work towards something like `np.opts(precision="low")` or so, but that doesn't change the question of defaults much...
I do like the idea of having a precise/fast toggle. Until we can develop one, I think we should prefer precise. So we should revert and document somewhere that float16 (and the soon-to-be-incoming bfloat16) are, in NumPy, container types, and that all the math for them is done as float16.
You meant `float32` here. And yes, I agree. Having a few code paths use platform/CPU-dependent instructions like AVX512-xxx ones, and as a result having a small subset of the NumPy API have different accuracy/speed trade-offs seems not all that useful to almost all users. And makes it harder to build up a mental model of what NumPy is actually doing. Cheers, Ralf
On Wed, 2026-03-11 at 11:59 +0100, Ralf Gommers via NumPy-Discussion wrote:
On Wed, Mar 11, 2026 at 10:58 AM matti picus via NumPy-Discussion < numpy-discussion@python.org> wrote:
On Tue, Mar 10, 2026 at 1:28 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
In the NumPy 2.4 cycle, there were some native float16 implementations merged with rather low precision leading to the following issue: https://github.com/numpy/numpy/issues/30821
That is, previously, it used float loops so ~0.5 ULP error, now is is 2+ULP for many algorithms, on _some_ hardware: https://github.com/numpy/numpy/pull/23351
There is always an argument around that users of float16 probably don't care about many ULP, but I guess they also have very few bits of precision to begin with? I don't have a huge opinion on it, but we are more and more in the position where it is unclear if sacrificing a bit of precision is the right thing or not...
Similar questions actually arise for float32 math, is it OK to trade- off precision for performance (or to what degree, everything trades a bit)? We have had discussions around this before but it is still a difficult trade-off to make and there is no choice that makes everyone happy. [1]
- Sebastian
[1] We can work towards something like `np.opts(precision="low")` or so, but that doesn't change the question of defaults much...
I do like the idea of having a precise/fast toggle. Until we can develop one, I think we should prefer precise. So we should revert and document somewhere that float16 (and the soon-to-be-incoming bfloat16) are, in NumPy, container types, and that all the math for them is done as float16.
You meant `float32` here. And yes, I agree. Having a few code paths use
No, I meant float16, I don't think we have a bad variability for float32 right now and while there is a different discussion to be had about float32, I think those paths would at least be consistent across architectures (as it would be custom implementations). But it sounds like you agree with "revert" here, which would is my tendency, even if I don't have a clear picture where to draw the line, since hardware/platform differences always exist to some degree. - Sebastian
platform/CPU-dependent instructions like AVX512-xxx ones, and as a result having a small subset of the NumPy API have different accuracy/speed trade-offs seems not all that useful to almost all users. And makes it harder to build up a mental model of what NumPy is actually doing.
Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: sebastian@sipsolutions.net
On Wed, Mar 11, 2026 at 12:13 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Wed, 2026-03-11 at 11:59 +0100, Ralf Gommers via NumPy-Discussion wrote:
On Wed, Mar 11, 2026 at 10:58 AM matti picus via NumPy-Discussion < numpy-discussion@python.org> wrote:
On Tue, Mar 10, 2026 at 1:28 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
In the NumPy 2.4 cycle, there were some native float16 implementations merged with rather low precision leading to the following issue: https://github.com/numpy/numpy/issues/30821
That is, previously, it used float loops so ~0.5 ULP error, now is is 2+ULP for many algorithms, on _some_ hardware: https://github.com/numpy/numpy/pull/23351
There is always an argument around that users of float16 probably don't care about many ULP, but I guess they also have very few bits of precision to begin with? I don't have a huge opinion on it, but we are more and more in the position where it is unclear if sacrificing a bit of precision is the right thing or not...
Similar questions actually arise for float32 math, is it OK to trade- off precision for performance (or to what degree, everything trades a bit)? We have had discussions around this before but it is still a difficult trade-off to make and there is no choice that makes everyone happy. [1]
- Sebastian
[1] We can work towards something like `np.opts(precision="low")` or so, but that doesn't change the question of defaults much...
I do like the idea of having a precise/fast toggle. Until we can develop one, I think we should prefer precise. So we should revert and document somewhere that float16 (and the soon-to-be-incoming bfloat16) are, in NumPy, container types, and that all the math for them is done as float16.
You meant `float32` here. And yes, I agree. Having a few code paths use
No, I meant float16,
I was replying to "done as float16". Operations on arrays with dtype float16 are largely done as "upcast to float32, perform the operation, downcast again". That's what "container types" means (I've been calling them "storage types", not sure what's the most standard name here). So I agreed with everything you said, and was just pointing out what looks like a confusing typo. I don't think we have a bad variability for
float32 right now and while there is a different discussion to be had about float32, I think those paths would at least be consistent across architectures (as it would be custom implementations).
Yes agreed, float32 operations are done in float32, with many a few exceptions. And that's fine and expected.
But it sounds like you agree with "revert" here, which would is my tendency, even if I don't have a clear picture where to draw the line, since hardware/platform differences always exist to some degree.
Indeed, I agree. I think the design rules should be simple: - Operations on both float32 and float64 should be done at their native precision without casts. - Operations for float16, and a future bfloat16 if we add that, should be done in float32 when floating-point errors occur. Lossless operations can of course done without casts. - Output dtypes should always be unchanged, so if there's an internal upcast, there should be a matching downcast. There may be some exceptions for operations that are very sensitive to numerical errors accumulating; those should be documented and treated as exceptions to the general rules. Cheers, Ralf
- Sebastian
platform/CPU-dependent instructions like AVX512-xxx ones, and as a result having a small subset of the NumPy API have different accuracy/speed trade-offs seems not all that useful to almost all users. And makes it harder to build up a mental model of what NumPy is actually doing.
Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: sebastian@sipsolutions.net
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: ralf.gommers@gmail.com
On Wed, 2026-03-11 at 12:42 +0100, Ralf Gommers via NumPy-Discussion wrote:
On Wed, Mar 11, 2026 at 12:13 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Wed, 2026-03-11 at 11:59 +0100, Ralf Gommers via NumPy- Discussion wrote:
On Wed, Mar 11, 2026 at 10:58 AM matti picus via NumPy-Discussion < numpy-discussion@python.org> wrote:
On Tue, Mar 10, 2026 at 1:28 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
In the NumPy 2.4 cycle, there were some native float16 implementations merged with rather low precision leading to the following issue: https://github.com/numpy/numpy/issues/30821
That is, previously, it used float loops so ~0.5 ULP error, now is is 2+ULP for many algorithms, on _some_ hardware: https://github.com/numpy/numpy/pull/23351
There is always an argument around that users of float16 probably don't care about many ULP, but I guess they also have very few bits of precision to begin with? I don't have a huge opinion on it, but we are more and more in the position where it is unclear if sacrificing a bit of precision is the right thing or not...
Similar questions actually arise for float32 math, is it OK to trade- off precision for performance (or to what degree, everything trades a bit)? We have had discussions around this before but it is still a difficult trade-off to make and there is no choice that makes everyone happy. [1]
- Sebastian
[1] We can work towards something like `np.opts(precision="low")` or so, but that doesn't change the question of defaults much...
I do like the idea of having a precise/fast toggle. Until we can develop one, I think we should prefer precise. So we should revert and document somewhere that float16 (and the soon-to-be-incoming bfloat16) are, in NumPy, container types, and that all the math for them is done as float16.
You meant `float32` here. And yes, I agree. Having a few code paths use
No, I meant float16,
I was replying to "done as float16". Operations on arrays with dtype float16 are largely done as "upcast to float32, perform the operation, downcast again". That's what "container types" means (I've been calling them "storage types", not sure what's the most standard name here). So I agreed with everything you said, and was just pointing out what looks like a confusing typo.
Ah sorry, I just over-read that one... For float16 things are tricky but I guess we may want to also just cement to not introduce serious hardware difference (which probably cements float32 use de-facto). If the wind changes slightly more there, I am not sure if we shouldn't allow a normal error range that is still very slightly larger than a float32 version. For float32 we really do need a more clear guidelines/discussion. E.g.: https://github.com/numpy/numpy/pull/29699 Proposes 4ULP versions for float32 sin/cos and that makes me very nervous. I might suggest for that we compare strictly with system math libraries precision and make sure we default to something that is very comparable or better. (The actual precision for math functions varies quite widely after all.) - Sebastian
I don't think we have a bad variability for
float32 right now and while there is a different discussion to be had about float32, I think those paths would at least be consistent across architectures (as it would be custom implementations).
Yes agreed, float32 operations are done in float32, with many a few exceptions. And that's fine and expected.
But it sounds like you agree with "revert" here, which would is my tendency, even if I don't have a clear picture where to draw the line, since hardware/platform differences always exist to some degree.
Indeed, I agree.
I think the design rules should be simple:
- Operations on both float32 and float64 should be done at their native precision without casts. - Operations for float16, and a future bfloat16 if we add that, should be done in float32 when floating-point errors occur. Lossless operations can of course done without casts. - Output dtypes should always be unchanged, so if there's an internal upcast, there should be a matching downcast.
There may be some exceptions for operations that are very sensitive to numerical errors accumulating; those should be documented and treated as exceptions to the general rules.
Cheers, Ralf
- Sebastian
platform/CPU-dependent instructions like AVX512-xxx ones, and as a result having a small subset of the NumPy API have different accuracy/speed trade-offs seems not all that useful to almost all users. And makes it harder to build up a mental model of what NumPy is actually doing.
Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: sebastian@sipsolutions.net
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: ralf.gommers@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: sebastian@sipsolutions.net
On Wed, Mar 11, 2026 at 1:03 PM Ralf Gommers via NumPy-Discussion <numpy-discussion@python.org> wrote:
On Wed, Mar 11, 2026 at 10:58 AM matti picus via NumPy-Discussion <numpy-discussion@python.org> wrote:
I do like the idea of having a precise/fast toggle. Until we can develop one, I think we should prefer precise. So we should revert and document somewhere that float16 (and the soon-to-be-incoming bfloat16) are, in NumPy, container types, and that all the math for them is done as float16.
You meant `float32` here. And yes, I agree. Having a few code paths use platform/CPU-dependent instructions like AVX512-xxx ones, and as a result having a small subset of the NumPy API have different accuracy/speed trade-offs seems not all that useful to almost all users. And makes it harder to build up a mental model of what NumPy is actually doing.
Cheers, Ralf
Ahh, yes sorry. The last word should have been "float32". Matti
participants (3)
-
matti picus -
Ralf Gommers -
Sebastian Berg