
On Wed, Mar 30, 2011 at 9:42 PM, Robert Kern <robert.kern@gmail.com> wrote:
On 3/30/11 3:05 PM, Mark Dickinson wrote:
[OT]: How is NumPy's float16 type implemented? Is it clever enough to do correct rounding for all basic arithmetic operations, or does it suffer from the double-rounding problems that you'd get from (convert operands to float64; do op in float64; round back to float16)?
We do the latter, I'm afraid. Except with float32 instead of float64.
[Still OT] Confession time: after asking this question, I had a sneaking suspicion that it was a stupid one. And having had time to think a bit, it turns out that it is. As far as I can tell, assuming round-half-to-even, there *are* no double rounding problems for primitive arithmetic operations with the (convert operands to float32; do operation in float32; convert back). This is probably a well-known fact in numerics land, and I feel embarrassed for not noticing. The key point is that the precision of float32 (24 bit precision) is at least double that of float16 (11 bit precision), plus a couple of extra bits; it's then easy to see that there can be no problems with multiplication, a mite harder to see that addition and subtraction are fine, and just a tiny bit harder again to show that division of two float16s can never give a result that'll be rounded the wrong way under the double (to float32, then to float16) rounding. Sheepishly, Mark