Adding a half-float (16-bit) type to PEP 3118 (and possibly the struct module?)

Hello, Numpy 1.6.0 adds support for a half-float (16-bit) data type, but cannot currently export a buffer interface to the data, since the closest type that PEP 3118 supports is an unsigned short ('H'). This makes working with the data from outside numpy (for example, from Cython) difficult, since even if numpy were to expose a buffer interface to the data, it's unclear that the data needs special treatment to interpret correctly (numpy does this with bit shifting functions to convert it to a float32, but it has access to the array dtype which isn't available through the buffer interface, per my understanding). What would be required to get a float16 data type added to PEP 3118 (either implicitly via inclusion of the struct module, or explicitly in the PEP itself)? I'm not currently a contributor to python, numpy or cython, but am prepared to provide patches. Some of my exploratory work for numpy and cython (which is my driving use case) is below. Numpy seems to use the 'e' format character, so I stuck with that. Thanks, Eli http://en.wikipedia.org/wiki/Half_precision_floating-point_format https://github.com/wickedgrey/cython https://github.com/wickedgrey/numpy

On Wed, Mar 30, 2011 at 2:37 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote: ..
I would like to see a patch adding float16 to struct and ctypes modules together with the buffer support. Adding features to PEP 3118 that cannot be exercised by the standard library is not a good idea. (Case in point: support for multi-dimensional arrays.)

On Wed, Mar 30, 2011 at 7:54 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I would like to see a patch adding float16 to struct and ctypes modules together with the buffer support.
I'm not sure how much sense this makes for ctypes, given that float16 isn't a datatype supported by most C implementations. Mark

On Wed, Mar 30, 2011 at 3:02 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
"On ARM targets, GCC supports half-precision (16-bit) floating point via the __fp16 type." http://gcc.gnu.org/onlinedocs/gcc/Half_002dPrecision.html However, before ctypes can support this, half-floats' support should be added to libffi through platform specific assembly hackery. So I withdraw my suggestion that ctypes support should be a prerequisite for float16 support in the buffer protocol, but I still would like to see it in struct. BTW, what letter code is proposed for half-floats? The only unassigned letter in the word "half" is 'a'. Maybe it is time to extend struct and buffer format specification to include field bit-width?

On 3/31/11 10:52 AM, Alexander Belopolsky wrote:
The proposed letter code is 'e', as used in numpy. I'm not sure of the logic that went behind the choice, except perhaps that 'e' is near 'd' and 'f'. It's not too late to change, though. I don't know of any other group that has decided on such any kind of letter code for half-floats. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Thu, Mar 31, 2011 at 12:36 PM, Robert Kern <robert.kern@gmail.com> wrote: ..
The proposed letter code is 'e', as used in numpy. I'm not sure of the logic that went behind the choice, except perhaps that 'e' is near 'd' and 'f'.
So it is 'e' for half, 'f' for single and 'd' for double. Given that in English alphabet the order is d, e, f, I find this choice rather unintuitive.
It's not too late to change, though. I don't know of any other group that has decided on such any kind of letter code for half-floats.
There is a language, Q, that uses "e" for single-precision floats. They call C-float "real" and C-double "float". See <http://www.kx.com/q/d/primer.htm>. Codes "e" for binary32 and "f" for binary64 make some sense alphabetically, but would suggest "d" for binary16, which would neither work for Python nor for Q because "d" is double in Python and date in Q. Note that IEEE 754-2008 also defines a binary128, quadruple precision format. If we keep assigning single letter codes to datatypes, struct/buffer format will soon resemble strftime with every letter of English alphabet having some (often non-obvious) meaning. (If we have to choose a single-letter code, I would vote for 'a' for hAlf and 'u' for qUad.) I would rather see some syntax that would allow multi-character type specifications. For example, {binary16} for half-floats and {binary128} for quadruple precision floats. This syntax may allow for support 3rd party type registry and private extensions.

On 3/31/11 12:58 PM, Alexander Belopolsky wrote:
Oh, we're well down that path. :-)
'u' is already reserved in PEP 3118, and 'a' is already used in numpy, though not in the PEP 3118 interface implementation.
PEP 3118 does define a parametric 't' type: 16t would be a 16-bit field with undefined internal format. Elsewhere in the thread I suggested an extension to this add a freeform name to this type to allow 3rd parties agree on new types without needing changes to PEP 3118 or needing more single-letter codes. E.g. 16t{halffloat} -> IEEE 754-2008 half-float 128t{quadfloat} -> IEEE 754-2008 quad-float 96t{80bitfloat} -> 80-bit extended precision float stored in 96 bits etc. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Wed, Mar 30, 2011 at 7:37 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:
Hmm. A partial list of requirements: (1) An open bugs.python.org issue. (2) Someone to provide patches (it sounds like you're up for this). (3) Someone else willing to review those patches (this is the hard part). (4) General agreement in the b.p.o. issue that this is a worthwhile feature to include; a disagreement here would punt the issue back into python-dev or python-ideas territory for wider discussion. It probably doesn't make sense to try to update the PEP itself: just propose the addition to the struct module in an issue. Work on the struct part of PEP 3118 is somewhat stalled at the moment; I had assigned some of those issues to myself, but unassigned them after finding I didn't really have proper time to think about them. If you could help out with some of the other open PEP 3118 issues, that might go a long way towards persuading someone to review your changes. For myself, I have mixed feelings on the proposed addition: while I can see how the half-precision floats would be useful in NumPy, it's not so clear that they'd be useful to Python itself. It feels a little bit odd to have NumPy driving Python additions that may not be of that much interest to non-NumPy users. Mark

On 3/30/11 1:54 PM, Mark Dickinson wrote:
Like Ellipsis, multidimensional extended slicing, complex numbers, and non-bool rich comparisons? :-) I think the major point in its favor is that PEP 3118 defines a protocol for third party libraries to communicate, the most notable of which really was numpy. Python itself needs only a subset of that, which was mostly already capably handled by the old buffer protocol. Still, it's worth defining the standard to allow third parties to communicate the full spectrum of things they want to tell each other. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Wed, Mar 30, 2011 at 8:53 PM, Robert Kern <robert.kern@gmail.com> wrote:
Like Ellipsis, multidimensional extended slicing, complex numbers, and non-bool rich comparisons? :-)
Indeed! (BTW, I didn't know that Python's complex numbers were NumPy influenced: thanks for that.)
Yes, that makes sense. It's not very clear to me what the scope of the Python additions would be. [OT]: How is NumPy's float16 type implemented? Is it clever enough to do correct rounding for all basic arithmetic operations, or does it suffer from the double-rounding problems that you'd get from (convert operands to float64; do op in float64; round back to float16)? Mark

On 3/30/11 3:05 PM, Mark Dickinson wrote:
As far as I can tell (and I've really only looked at PEP 3118 in any detail today), only producers and consumers of the buffer actually care about the contents of the format string, and consumers are free to reject format codes that they don't understand. I think you can just treat the section of the PEP defining the format codes as informational, much like the DB-API only a little more rigorous. Adding support for it to the struct module is a good bonus. As a digression, it would be great if the format codes were defined in an extensible fashion, such that two agreeing third parties could talk to each other using their own format codes without having to modify the PEP. It already contains a little bit of this with the 't' code. If you could add a distinguishing name as well (besides the ':name:' syntax, which is reserved for adding names to fields, not types), then numpy and Cython could simply agree that '16t{half}', for example, meant a half-float without having to wait for the PEP to be modified.
We do the latter, I'm afraid. Except with float32 instead of float64. https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/loops.c.src#... https://github.com/numpy/numpy/blob/master/numpy/core/src/npymath/halffloat.... -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Wed, Mar 30, 2011 at 9:42 PM, Robert Kern <robert.kern@gmail.com> wrote:
[Still OT] Confession time: after asking this question, I had a sneaking suspicion that it was a stupid one. And having had time to think a bit, it turns out that it is. As far as I can tell, assuming round-half-to-even, there *are* no double rounding problems for primitive arithmetic operations with the (convert operands to float32; do operation in float32; convert back). This is probably a well-known fact in numerics land, and I feel embarrassed for not noticing. The key point is that the precision of float32 (24 bit precision) is at least double that of float16 (11 bit precision), plus a couple of extra bits; it's then easy to see that there can be no problems with multiplication, a mite harder to see that addition and subtraction are fine, and just a tiny bit harder again to show that division of two float16s can never give a result that'll be rounded the wrong way under the double (to float32, then to float16) rounding. Sheepishly, Mark

On Tue, Apr 5, 2011 at 1:28 AM, Mark Dickinson <dickinsm@gmail.com> wrote:
[Still OT] Confession time: after asking this question, I had a sneaking suspicion that it was a stupid one.
If anyone was looking for a field absolutely rife with answers that are "simple, obvious and dead wrong", avoiding cumulative errors in binary float manipulation would have to be a prime candidate. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Mar 30, 2011 at 1:05 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
(BTW, I didn't know that Python's complex numbers were NumPy influenced: thanks for that.)
You have Jim Hugunin to thank for that. I can still recall the exact location at the third Python conference (http://www.python.org/workshops/1995-12/) where Jim cornered and convinced me to add complex numbers (I don't recall which other features were part of the deal). Of course we have Jim to thank for NumPy, Jython, and IronPython as well. :-) -- --Guido van Rossum (python.org/~guido)

Robert Kern wrote:
But that's impossible -- there's no way the buffer protocol can explicitly cover all possible data types that any third party application might need to deal with. There needs to be some common ground, and the buffer protocol currently defines that as the set of standard C data types. -- Greg

On 3/30/11 4:03 PM, Greg Ewing wrote:
And several more. I think that it would be reasonable to add more when two libraries come with a solid use case, like communicating the half-floats that are standard in OpenCL and other GPU languages. What do you think of my idea for adding extensibility to the format syntax, which should allow two libraries to communicate new types without having to modify the PEP every time? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 3/30/2011 2:54 PM, Mark Dickinson wrote:
To start, email the two authors.
If this were added to the PEP, it would be included in http://bugs.python.org/issue3132
(2) Someone to provide patches (it sounds like you're up for this).
Or do a review of Meador Inge's latest (last January) patch version.
See above.
I think that a patch to _struct.py should include all the 3118 additions, and not just this one. Searching All Test 'pep 3118' on the tracker returns 7 open issues.
I am pretty sure both extended slices and Ellipsis were first added for Numpy's ancestor Numerical Python. In any case, the intent of the pep seems to be that struct be expanded to match NumPy. "Additions to the struct string-syntax The struct string-syntax is missing some characters to fully implement data-format descriptions already available elsewhere (in ctypes and NumPy for example)." Some of the additions (such as pointers) already seem less useful than float16, which I presume struct would just expand to (or compress from) a normal, usable, Python float. -- Terry Jan Reedy

On Mar 30, 2011, at 11:37 AM, Eli Stevens (Gmail) wrote:
+1 I would support adding float16 to the struct module. It's a well defined format so we might as well provide an accessor. Just open a feature request for it. Any issues surrounding its use (i.e. double-rounding) are no different that the usual float/double conversion issues. Raymond

On Wed, Mar 30, 2011 at 3:03 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
This seems like a simple solution, however: On Wed, Mar 30, 2011 at 1:24 PM, Terry Reedy <tjreedy@udel.edu> wrote:
If this were added to the PEP, it would be included in http://bugs.python.org/issue3132
I'm still working through the issue/patch, but it seems to be concerned with how to handle long (long?) doubles cleanly on various platforms with varying levels of support for it (at least, that's the impression I got; I'm still a little unclear about what exactly was deficient prior to the patch). That seems like it would be a separate issue to me; can you explain in more detail how they're related? Is just that the new work should be based on the source post patch? Also, am I correct in my understanding that any code changes to _struct.c, etc. would not show up in a production release before 3.3? I'm based out of a strictly 2.7 shop, so if I'm going to need to develop patches, I'll have to make sure I have some place to test things (for our purposes, we just need a spec that numpy and cython can standardize on, but if a patch to the struct module is what it's going to take to make that happen, I'll give it a shot :). Eli

On Mar 30, 2011, at 4:34 PM, Eli Stevens (Gmail) wrote:
I think the struct module addition for float16 could be handled separately and much more easily (since a half would fit in a double, but a long long double won't).
Also, am I correct in my understanding that any code changes to _struct.c, etc. would not show up in a production release before 3.3?
Yes, that's right. If you need something for today, it's not hard to write pure python code using struct to read in an int16 and then do the bit manipulations to pick apart the sign, exponent, and mantissa to create the float value.
I don't follow what your issue is? Can you check-out a copy of the current Hg repository and build your patch against the default branch? Raymond

On Wed, Mar 30, 2011 at 5:06 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
I think the struct module addition for float16 could be handled separately and much more easily (since a half would fit in a double, but a long long double won't).
Okay, if no other objections get raised, I'll open a new issue for it (probably sometime tomorrow).
If you need something for today, it's not hard to write pure python code using struct to read in an int16 and then do the bit manipulations to pick apart the sign, exponent, and mantissa to create the float value.
My particular use case focuses on getting float16 data from numpy (which handles the bit fiddling already) to be exposed in cython (which doesn't ATM, and can't know to do so without a specific float16 data format type). Step one of that is to update the spec to include a float16 type, either by changing PEP 3118, or adding it to the struct module (which is referenced by the PEP). Once that happens, I think there's a valid case to be made for numpy to export the float16 via the buffer interface, and a decent shot at getting some special case code added to cython. I don't need any CPython code changes for my use case, I don't think.
I don't follow what your issue is? Can you check-out a copy of the current Hg repository and build your patch against the default branch?
Sorry, I'm juggling three different threads on this topic (python-ideas, cython-users, numpy-discussion), and am doing a poor job of keeping the contexts sorted out. :) Yes, I will try and compile/test CPython and build a patch for _struct.c from the current repo. Thanks! Eli

On Wed, Mar 30, 2011 at 5:32 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:
The issue is here: http://bugs.python.org/issue11734
Yes, I will try and compile/test CPython and build a patch for _struct.c from the current repo.
I am a little unclear as to the purpose of structmember.{h,c}; does the .h file need a line for the half-float type along the lines of: #define T_HALFFLOAT 21 And do PyMember_GetOne and PyMember_SetOne need corresponding entries in their switch statements? Something like: case T_HALFFLOAT: // FIXME: need half support break; case T_FLOAT: v = PyFloat_FromDouble((double)*(float*)addr); break; And: case T_HALFFLOAT:{ // FIXME: needs half support break; } case T_FLOAT:{ double double_val = PyFloat_AsDouble(v); if ((double_val == -1) && PyErr_Occurred()) return -1; *(float*)addr = (float)double_val; break; } The unit tests I've added for the struct.pack and struck.unpack functions don't seem to need it, but I want to make sure there isn't something I'm missing. Apologies if this should be moved to python-dev; just let me know and I can repost there. Thanks! Eli

Just out of curiosity, is the layout of numpy's float16 based on any existing standard, or is it something purely invented by numpy? If it's a standard format, that would lend more weight to the idea of supporting it in the buffer interface. -- Greg

On Wed, Mar 30, 2011 at 7:32 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Per my understanding (I haven't gone and cross-checked the code with the spec, however), it's based on IEEE 754-2008: http://en.wikipedia.org/wiki/Half_precision_floating-point_format Eli

On 3/30/2011 10:32 PM, Greg Ewing wrote:
I understood Robert Kern's statement "I think that it would be reasonable to add more when two libraries come with a solid use case, like communicating the half-floats that are standard in OpenCL and other GPU languages. " to mean that numpy adopted it from OpenCL, etc. If so, I think Python should definitely add it. -- Terry Jan Reedy

On Wed, Mar 30, 2011 at 2:37 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote: ..
I would like to see a patch adding float16 to struct and ctypes modules together with the buffer support. Adding features to PEP 3118 that cannot be exercised by the standard library is not a good idea. (Case in point: support for multi-dimensional arrays.)

On Wed, Mar 30, 2011 at 7:54 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I would like to see a patch adding float16 to struct and ctypes modules together with the buffer support.
I'm not sure how much sense this makes for ctypes, given that float16 isn't a datatype supported by most C implementations. Mark

On Wed, Mar 30, 2011 at 3:02 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
"On ARM targets, GCC supports half-precision (16-bit) floating point via the __fp16 type." http://gcc.gnu.org/onlinedocs/gcc/Half_002dPrecision.html However, before ctypes can support this, half-floats' support should be added to libffi through platform specific assembly hackery. So I withdraw my suggestion that ctypes support should be a prerequisite for float16 support in the buffer protocol, but I still would like to see it in struct. BTW, what letter code is proposed for half-floats? The only unassigned letter in the word "half" is 'a'. Maybe it is time to extend struct and buffer format specification to include field bit-width?

On 3/31/11 10:52 AM, Alexander Belopolsky wrote:
The proposed letter code is 'e', as used in numpy. I'm not sure of the logic that went behind the choice, except perhaps that 'e' is near 'd' and 'f'. It's not too late to change, though. I don't know of any other group that has decided on such any kind of letter code for half-floats. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Thu, Mar 31, 2011 at 12:36 PM, Robert Kern <robert.kern@gmail.com> wrote: ..
The proposed letter code is 'e', as used in numpy. I'm not sure of the logic that went behind the choice, except perhaps that 'e' is near 'd' and 'f'.
So it is 'e' for half, 'f' for single and 'd' for double. Given that in English alphabet the order is d, e, f, I find this choice rather unintuitive.
It's not too late to change, though. I don't know of any other group that has decided on such any kind of letter code for half-floats.
There is a language, Q, that uses "e" for single-precision floats. They call C-float "real" and C-double "float". See <http://www.kx.com/q/d/primer.htm>. Codes "e" for binary32 and "f" for binary64 make some sense alphabetically, but would suggest "d" for binary16, which would neither work for Python nor for Q because "d" is double in Python and date in Q. Note that IEEE 754-2008 also defines a binary128, quadruple precision format. If we keep assigning single letter codes to datatypes, struct/buffer format will soon resemble strftime with every letter of English alphabet having some (often non-obvious) meaning. (If we have to choose a single-letter code, I would vote for 'a' for hAlf and 'u' for qUad.) I would rather see some syntax that would allow multi-character type specifications. For example, {binary16} for half-floats and {binary128} for quadruple precision floats. This syntax may allow for support 3rd party type registry and private extensions.

On 3/31/11 12:58 PM, Alexander Belopolsky wrote:
Oh, we're well down that path. :-)
'u' is already reserved in PEP 3118, and 'a' is already used in numpy, though not in the PEP 3118 interface implementation.
PEP 3118 does define a parametric 't' type: 16t would be a 16-bit field with undefined internal format. Elsewhere in the thread I suggested an extension to this add a freeform name to this type to allow 3rd parties agree on new types without needing changes to PEP 3118 or needing more single-letter codes. E.g. 16t{halffloat} -> IEEE 754-2008 half-float 128t{quadfloat} -> IEEE 754-2008 quad-float 96t{80bitfloat} -> 80-bit extended precision float stored in 96 bits etc. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Wed, Mar 30, 2011 at 7:37 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:
Hmm. A partial list of requirements: (1) An open bugs.python.org issue. (2) Someone to provide patches (it sounds like you're up for this). (3) Someone else willing to review those patches (this is the hard part). (4) General agreement in the b.p.o. issue that this is a worthwhile feature to include; a disagreement here would punt the issue back into python-dev or python-ideas territory for wider discussion. It probably doesn't make sense to try to update the PEP itself: just propose the addition to the struct module in an issue. Work on the struct part of PEP 3118 is somewhat stalled at the moment; I had assigned some of those issues to myself, but unassigned them after finding I didn't really have proper time to think about them. If you could help out with some of the other open PEP 3118 issues, that might go a long way towards persuading someone to review your changes. For myself, I have mixed feelings on the proposed addition: while I can see how the half-precision floats would be useful in NumPy, it's not so clear that they'd be useful to Python itself. It feels a little bit odd to have NumPy driving Python additions that may not be of that much interest to non-NumPy users. Mark

On 3/30/11 1:54 PM, Mark Dickinson wrote:
Like Ellipsis, multidimensional extended slicing, complex numbers, and non-bool rich comparisons? :-) I think the major point in its favor is that PEP 3118 defines a protocol for third party libraries to communicate, the most notable of which really was numpy. Python itself needs only a subset of that, which was mostly already capably handled by the old buffer protocol. Still, it's worth defining the standard to allow third parties to communicate the full spectrum of things they want to tell each other. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Wed, Mar 30, 2011 at 8:53 PM, Robert Kern <robert.kern@gmail.com> wrote:
Like Ellipsis, multidimensional extended slicing, complex numbers, and non-bool rich comparisons? :-)
Indeed! (BTW, I didn't know that Python's complex numbers were NumPy influenced: thanks for that.)
Yes, that makes sense. It's not very clear to me what the scope of the Python additions would be. [OT]: How is NumPy's float16 type implemented? Is it clever enough to do correct rounding for all basic arithmetic operations, or does it suffer from the double-rounding problems that you'd get from (convert operands to float64; do op in float64; round back to float16)? Mark

On 3/30/11 3:05 PM, Mark Dickinson wrote:
As far as I can tell (and I've really only looked at PEP 3118 in any detail today), only producers and consumers of the buffer actually care about the contents of the format string, and consumers are free to reject format codes that they don't understand. I think you can just treat the section of the PEP defining the format codes as informational, much like the DB-API only a little more rigorous. Adding support for it to the struct module is a good bonus. As a digression, it would be great if the format codes were defined in an extensible fashion, such that two agreeing third parties could talk to each other using their own format codes without having to modify the PEP. It already contains a little bit of this with the 't' code. If you could add a distinguishing name as well (besides the ':name:' syntax, which is reserved for adding names to fields, not types), then numpy and Cython could simply agree that '16t{half}', for example, meant a half-float without having to wait for the PEP to be modified.
We do the latter, I'm afraid. Except with float32 instead of float64. https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/loops.c.src#... https://github.com/numpy/numpy/blob/master/numpy/core/src/npymath/halffloat.... -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Wed, Mar 30, 2011 at 9:42 PM, Robert Kern <robert.kern@gmail.com> wrote:
[Still OT] Confession time: after asking this question, I had a sneaking suspicion that it was a stupid one. And having had time to think a bit, it turns out that it is. As far as I can tell, assuming round-half-to-even, there *are* no double rounding problems for primitive arithmetic operations with the (convert operands to float32; do operation in float32; convert back). This is probably a well-known fact in numerics land, and I feel embarrassed for not noticing. The key point is that the precision of float32 (24 bit precision) is at least double that of float16 (11 bit precision), plus a couple of extra bits; it's then easy to see that there can be no problems with multiplication, a mite harder to see that addition and subtraction are fine, and just a tiny bit harder again to show that division of two float16s can never give a result that'll be rounded the wrong way under the double (to float32, then to float16) rounding. Sheepishly, Mark

On Tue, Apr 5, 2011 at 1:28 AM, Mark Dickinson <dickinsm@gmail.com> wrote:
[Still OT] Confession time: after asking this question, I had a sneaking suspicion that it was a stupid one.
If anyone was looking for a field absolutely rife with answers that are "simple, obvious and dead wrong", avoiding cumulative errors in binary float manipulation would have to be a prime candidate. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Mar 30, 2011 at 1:05 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
(BTW, I didn't know that Python's complex numbers were NumPy influenced: thanks for that.)
You have Jim Hugunin to thank for that. I can still recall the exact location at the third Python conference (http://www.python.org/workshops/1995-12/) where Jim cornered and convinced me to add complex numbers (I don't recall which other features were part of the deal). Of course we have Jim to thank for NumPy, Jython, and IronPython as well. :-) -- --Guido van Rossum (python.org/~guido)

Robert Kern wrote:
But that's impossible -- there's no way the buffer protocol can explicitly cover all possible data types that any third party application might need to deal with. There needs to be some common ground, and the buffer protocol currently defines that as the set of standard C data types. -- Greg

On 3/30/11 4:03 PM, Greg Ewing wrote:
And several more. I think that it would be reasonable to add more when two libraries come with a solid use case, like communicating the half-floats that are standard in OpenCL and other GPU languages. What do you think of my idea for adding extensibility to the format syntax, which should allow two libraries to communicate new types without having to modify the PEP every time? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 3/30/2011 2:54 PM, Mark Dickinson wrote:
To start, email the two authors.
If this were added to the PEP, it would be included in http://bugs.python.org/issue3132
(2) Someone to provide patches (it sounds like you're up for this).
Or do a review of Meador Inge's latest (last January) patch version.
See above.
I think that a patch to _struct.py should include all the 3118 additions, and not just this one. Searching All Test 'pep 3118' on the tracker returns 7 open issues.
I am pretty sure both extended slices and Ellipsis were first added for Numpy's ancestor Numerical Python. In any case, the intent of the pep seems to be that struct be expanded to match NumPy. "Additions to the struct string-syntax The struct string-syntax is missing some characters to fully implement data-format descriptions already available elsewhere (in ctypes and NumPy for example)." Some of the additions (such as pointers) already seem less useful than float16, which I presume struct would just expand to (or compress from) a normal, usable, Python float. -- Terry Jan Reedy

On Mar 30, 2011, at 11:37 AM, Eli Stevens (Gmail) wrote:
+1 I would support adding float16 to the struct module. It's a well defined format so we might as well provide an accessor. Just open a feature request for it. Any issues surrounding its use (i.e. double-rounding) are no different that the usual float/double conversion issues. Raymond

On Wed, Mar 30, 2011 at 3:03 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
This seems like a simple solution, however: On Wed, Mar 30, 2011 at 1:24 PM, Terry Reedy <tjreedy@udel.edu> wrote:
If this were added to the PEP, it would be included in http://bugs.python.org/issue3132
I'm still working through the issue/patch, but it seems to be concerned with how to handle long (long?) doubles cleanly on various platforms with varying levels of support for it (at least, that's the impression I got; I'm still a little unclear about what exactly was deficient prior to the patch). That seems like it would be a separate issue to me; can you explain in more detail how they're related? Is just that the new work should be based on the source post patch? Also, am I correct in my understanding that any code changes to _struct.c, etc. would not show up in a production release before 3.3? I'm based out of a strictly 2.7 shop, so if I'm going to need to develop patches, I'll have to make sure I have some place to test things (for our purposes, we just need a spec that numpy and cython can standardize on, but if a patch to the struct module is what it's going to take to make that happen, I'll give it a shot :). Eli

On Mar 30, 2011, at 4:34 PM, Eli Stevens (Gmail) wrote:
I think the struct module addition for float16 could be handled separately and much more easily (since a half would fit in a double, but a long long double won't).
Also, am I correct in my understanding that any code changes to _struct.c, etc. would not show up in a production release before 3.3?
Yes, that's right. If you need something for today, it's not hard to write pure python code using struct to read in an int16 and then do the bit manipulations to pick apart the sign, exponent, and mantissa to create the float value.
I don't follow what your issue is? Can you check-out a copy of the current Hg repository and build your patch against the default branch? Raymond

On Wed, Mar 30, 2011 at 5:06 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
I think the struct module addition for float16 could be handled separately and much more easily (since a half would fit in a double, but a long long double won't).
Okay, if no other objections get raised, I'll open a new issue for it (probably sometime tomorrow).
If you need something for today, it's not hard to write pure python code using struct to read in an int16 and then do the bit manipulations to pick apart the sign, exponent, and mantissa to create the float value.
My particular use case focuses on getting float16 data from numpy (which handles the bit fiddling already) to be exposed in cython (which doesn't ATM, and can't know to do so without a specific float16 data format type). Step one of that is to update the spec to include a float16 type, either by changing PEP 3118, or adding it to the struct module (which is referenced by the PEP). Once that happens, I think there's a valid case to be made for numpy to export the float16 via the buffer interface, and a decent shot at getting some special case code added to cython. I don't need any CPython code changes for my use case, I don't think.
I don't follow what your issue is? Can you check-out a copy of the current Hg repository and build your patch against the default branch?
Sorry, I'm juggling three different threads on this topic (python-ideas, cython-users, numpy-discussion), and am doing a poor job of keeping the contexts sorted out. :) Yes, I will try and compile/test CPython and build a patch for _struct.c from the current repo. Thanks! Eli

On Wed, Mar 30, 2011 at 5:32 PM, Eli Stevens (Gmail) <wickedgrey@gmail.com> wrote:
The issue is here: http://bugs.python.org/issue11734
Yes, I will try and compile/test CPython and build a patch for _struct.c from the current repo.
I am a little unclear as to the purpose of structmember.{h,c}; does the .h file need a line for the half-float type along the lines of: #define T_HALFFLOAT 21 And do PyMember_GetOne and PyMember_SetOne need corresponding entries in their switch statements? Something like: case T_HALFFLOAT: // FIXME: need half support break; case T_FLOAT: v = PyFloat_FromDouble((double)*(float*)addr); break; And: case T_HALFFLOAT:{ // FIXME: needs half support break; } case T_FLOAT:{ double double_val = PyFloat_AsDouble(v); if ((double_val == -1) && PyErr_Occurred()) return -1; *(float*)addr = (float)double_val; break; } The unit tests I've added for the struct.pack and struck.unpack functions don't seem to need it, but I want to make sure there isn't something I'm missing. Apologies if this should be moved to python-dev; just let me know and I can repost there. Thanks! Eli

Just out of curiosity, is the layout of numpy's float16 based on any existing standard, or is it something purely invented by numpy? If it's a standard format, that would lend more weight to the idea of supporting it in the buffer interface. -- Greg

On Wed, Mar 30, 2011 at 7:32 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Per my understanding (I haven't gone and cross-checked the code with the spec, however), it's based on IEEE 754-2008: http://en.wikipedia.org/wiki/Half_precision_floating-point_format Eli

On 3/30/2011 10:32 PM, Greg Ewing wrote:
I understood Robert Kern's statement "I think that it would be reasonable to add more when two libraries come with a solid use case, like communicating the half-floats that are standard in OpenCL and other GPU languages. " to mean that numpy adopted it from OpenCL, etc. If so, I think Python should definitely add it. -- Terry Jan Reedy
participants (10)
-
Alexander Belopolsky
-
Eli Stevens (Gmail)
-
Greg Ewing
-
Guido van Rossum
-
Mark Dickinson
-
MRAB
-
Nick Coghlan
-
Raymond Hettinger
-
Robert Kern
-
Terry Reedy