Re: [Python-ideas] Adding a half-float (16-bit) type to PEP 3118 (and possibly the struct module?)

April 4, 2011


      On Wed, Mar 30, 2011 at 9:42 PM, Robert Kern <robert.kern@gmail.com> wrote:
...
On 3/30/11 3:05 PM, Mark Dickinson wrote:
...
[OT]: How is NumPy's float16 type implemented?  Is it clever enough to
do correct rounding for all basic arithmetic operations, or does it
suffer from the double-rounding problems that you'd get from (convert
operands to float64; do op in float64; round back to float16)?
We do the latter, I'm afraid. Except with float32 instead of float64.
[Still OT] Confession time:  after asking this question, I had a
sneaking suspicion that it was a stupid one.  And having had time to
think a bit, it turns out that it is.  As far as I can tell, assuming
round-half-to-even, there *are* no double rounding problems for
primitive arithmetic operations with the (convert operands to float32;
do operation in float32; convert back).  This is probably a well-known
fact in numerics land, and I feel embarrassed for not noticing.  The
key point is that the precision of float32 (24 bit precision) is at
least double that of float16 (11 bit precision), plus a couple of
extra bits;  it's then easy to see that there can be no problems with
multiplication, a mite harder to see that addition and subtraction are
fine, and just a tiny bit harder again to show that division of two
float16s can never give a result that'll be rounded the wrong way
under the double (to float32, then to float16) rounding.

Sheepishly,

Mark