Mailman 3 float128 in fact float80 - NumPy-Discussion

float128 in fact float80

Matthew Brett

Oct. 15, 2011

11:29 p.m.

Hi, After getting rather confused, I concluded that float128 on a couple of Intel systems I have, is in fact an 80 bit extended precision number: http://en.wikipedia.org/wiki/Extended_precision

...

...
...
np.finfo(np.float128).nmant 63 np.finfo(np.float128).nexp 15

That is rather confusing. What is the rationale for calling this float128? It is not IEEE 754 float128, and yet it seems to claim so. Best, Matthew

Show replies by date

On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. Nadav ________________________________________ From: numpy-discussion-bounces@scipy.org [numpy-discussion-bounces@scipy.org] On Behalf Of Matthew Brett [matthew.brett@gmail.com] Sent: 16 October 2011 01:29 To: Discussion of Numerical Python Subject: [Numpy-discussion] float128 in fact float80 Hi, After getting rather confused, I concluded that float128 on a couple of Intel systems I have, is in fact an 80 bit extended precision number: http://en.wikipedia.org/wiki/Extended_precision

...

...
...
np.finfo(np.float128).nmant 63 np.finfo(np.float128).nexp 15

That is rather confusing. What is the rationale for calling this float128? It is not IEEE 754 float128, and yet it seems to claim so. Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Matthew Brett

7:04 a.m.

Hi, On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh <nadavh@visionsense.com> wrote:

...

On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers.

Right - but the problem here is that it is very confusing. There is something called binary128 in the IEEE standard, and what numpy has is not that. float16, float32 and float64 are all IEEE standards called binary16, binary32 and binary64. Thus it was natural for me to assume wrongly that float128 was the IEEE standard. I'd therefore assume that it could store all the integers up to 2**113 exactly, and so on. On the other hand, if I found out that the float80 dtype in fact took 128 bits of storage, I'd rightly conclude that the data were being padded out with zeros and not be very surprised. I'd also I think find it easier to understand what was going on if there were float80 types on 32-bit and 64-bit, but they had different itemsizes. If there was one float80 type (with different itemsizes on 32, 64 bit) then I would not have to write guard try .. excepts around my use of the types to keep compatible across platforms. So float80 on both platforms seems like the less confusing option to me. Best, Matthew

David Cournapeau

7:28 a.m.

On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh <nadavh@visionsense.com> wrote:

...
On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers.

Right - but the problem here is that it is very confusing. There is something called binary128 in the IEEE standard, and what numpy has is not that. float16, float32 and float64 are all IEEE standards called binary16, binary32 and binary64.

This one is easy: few CPU support the 128 bits float specified in IEEE standard (the only ones I know are the expensive IBM ones). Then there are the cases where it is implemented in software (SPARC use the double-pair IIRC). So you would need binar80, binary96, binary128, binary128_double_pair, etc... That would be a nightmare to support, and also not portable: what does binary80 become on ppc ? What does binary96 become on 32 bits Intel ? Or on windows (where long double is the same as double for visual studio) ? binary128 should only be thought as a (bad) synonym to np.longdouble. David

Matthew Brett

7:33 a.m.

Hi, On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau <cournape@gmail.com> wrote:

...

On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh <nadavh@visionsense.com> wrote:

...
On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers.

Right - but the problem here is that it is very confusing. There is something called binary128 in the IEEE standard, and what numpy has is not that. float16, float32 and float64 are all IEEE standards called binary16, binary32 and binary64.

This one is easy: few CPU support the 128 bits float specified in IEEE standard (the only ones I know are the expensive IBM ones). Then there are the cases where it is implemented in software (SPARC use the double-pair IIRC).

So you would need binar80, binary96, binary128, binary128_double_pair, etc... That would be a nightmare to support, and also not portable: what does binary80 become on ppc ? What does binary96 become on 32 bits Intel ? Or on windows (where long double is the same as double for visual studio) ?

binary128 should only be thought as a (bad) synonym to np.longdouble.

What would be the nightmare to support - the different names for the different precisions? How many do we support in fact? Apart from float80? Is there some reason the support burden is less by naming lots of different precisions the same? See you, Matthew

David Cournapeau

8:18 a.m.

On Sun, Oct 16, 2011 at 8:33 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh <nadavh@visionsense.com> wrote:

...
On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers.

Right - but the problem here is that it is very confusing. There is something called binary128 in the IEEE standard, and what numpy has is not that. float16, float32 and float64 are all IEEE standards called binary16, binary32 and binary64.

This one is easy: few CPU support the 128 bits float specified in IEEE standard (the only ones I know are the expensive IBM ones). Then there are the cases where it is implemented in software (SPARC use the double-pair IIRC).

So you would need binar80, binary96, binary128, binary128_double_pair, etc... That would be a nightmare to support, and also not portable: what does binary80 become on ppc ? What does binary96 become on 32 bits Intel ? Or on windows (where long double is the same as double for visual studio) ?

binary128 should only be thought as a (bad) synonym to np.longdouble.

What would be the nightmare to support - the different names for the different precisions?

Well, if you have an array of np.float80, what does it do on ppc, or windows, or solaris ? You will have a myriad of incompatible formats, and the only thing you gained by naming them differently is that you lose the ability of using the code on different platforms. The alternative is to implement in software a quadruple precision number. Using extended precision is fundamentally non-portable on today's CPU. David

Matthew Brett

6:40 p.m.

Hi, On Sun, Oct 16, 2011 at 1:18 AM, David Cournapeau <cournape@gmail.com> wrote:

...

On Sun, Oct 16, 2011 at 8:33 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh <nadavh@visionsense.com> wrote:

...
On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers.

Right - but the problem here is that it is very confusing. There is something called binary128 in the IEEE standard, and what numpy has is not that. float16, float32 and float64 are all IEEE standards called binary16, binary32 and binary64.

This one is easy: few CPU support the 128 bits float specified in IEEE standard (the only ones I know are the expensive IBM ones). Then there are the cases where it is implemented in software (SPARC use the double-pair IIRC).

So you would need binar80, binary96, binary128, binary128_double_pair, etc... That would be a nightmare to support, and also not portable: what does binary80 become on ppc ? What does binary96 become on 32 bits Intel ? Or on windows (where long double is the same as double for visual studio) ?

binary128 should only be thought as a (bad) synonym to np.longdouble.

What would be the nightmare to support - the different names for the different precisions?

Well, if you have an array of np.float80, what does it do on ppc, or windows, or solaris ? You will have a myriad of incompatible formats, and the only thing you gained by naming them differently is that you lose the ability of using the code on different platforms. The alternative is to implement in software a quadruple precision number.

The thing you gain by naming them correctly is the person using the format knows what it is. If we use float64 we know what that is. If we are using float128, we've got no idea what it is. I had actually guessed that numpy had some software emulation for IEEE float128. I don't know how I would have known otherwise. Obviously what I'm proposing is that the names follow the precisions of the numbers, not the itemsize. If what we actually have is something that is sometimes called float128, sometimes float96, that is always what C thinks of as long double, then surely the best option would be: float80 floatLD for intel 32 and 64 bit, and then floatPPC floatLD for whatever PPC has, and so on. See you, Matthew

Matthew Brett

6:41 p.m.

Hi, On Sun, Oct 16, 2011 at 11:40 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

On Sun, Oct 16, 2011 at 1:18 AM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sun, Oct 16, 2011 at 8:33 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau <cournape@gmail.com> wrote:

...
On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh <nadavh@visionsense.com> wrote:

...
On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers.

Right - but the problem here is that it is very confusing. There is something called binary128 in the IEEE standard, and what numpy has is not that. float16, float32 and float64 are all IEEE standards called binary16, binary32 and binary64.

This one is easy: few CPU support the 128 bits float specified in IEEE standard (the only ones I know are the expensive IBM ones). Then there are the cases where it is implemented in software (SPARC use the double-pair IIRC).

So you would need binar80, binary96, binary128, binary128_double_pair, etc... That would be a nightmare to support, and also not portable: what does binary80 become on ppc ? What does binary96 become on 32 bits Intel ? Or on windows (where long double is the same as double for visual studio) ?

binary128 should only be thought as a (bad) synonym to np.longdouble.

What would be the nightmare to support - the different names for the different precisions?

Well, if you have an array of np.float80, what does it do on ppc, or windows, or solaris ? You will have a myriad of incompatible formats, and the only thing you gained by naming them differently is that you lose the ability of using the code on different platforms. The alternative is to implement in software a quadruple precision number.

The thing you gain by naming them correctly is the person using the format knows what it is.

If we use float64 we know what that is. If we are using float128, we've got no idea what it is.

I had actually guessed that numpy had some software emulation for IEEE float128. I don't know how I would have known otherwise.

Obviously what I'm proposing is that the names follow the precisions of the numbers, not the itemsize.

If what we actually have is something that is sometimes called float128, sometimes float96, that is always what C thinks of as long double, then surely the best option would be:

float80 floatLD

for intel 32 and 64 bit, and then

floatPPC floatLD

Sorry - I missed out: Where floatLD is float80, floatPPC is floatLD Matthew

David Cournapeau

9:11 p.m.

On Sun, Oct 16, 2011 at 7:40 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

...

If we use float64 we know what that is. If we are using float128, we've got no idea what it is.

I think there is no arguing here: the ideal solution would be to follow what happens with 32 and 64 bits reprensentations. But this is impossible on today's architectures because the 2008 version of the IEEE 754 standard is not supported yet.

...

I had actually guessed that numpy had some software emulation for IEEE float128. I don't know how I would have known otherwise.

Obviously what I'm proposing is that the names follow the precisions of the numbers, not the itemsize.

If what we actually have is something that is sometimes called float128, sometimes float96, that is always what C thinks of as long double, then surely the best option would be:

float80 floatLD

for intel 32 and 64 bit, and then

floatPPC floatLD

for whatever PPC has, and so on.

If all you want is a common name, there is already one: np.longdouble. This is an alias for the more platform-specific name. cheers, David

Matthew Brett

10:04 p.m.

Hi, On Sun, Oct 16, 2011 at 2:11 PM, David Cournapeau <cournape@gmail.com> wrote:

...

On Sun, Oct 16, 2011 at 7:40 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

...
If we use float64 we know what that is. If we are using float128, we've got no idea what it is.

I think there is no arguing here: the ideal solution would be to follow what happens with 32 and 64 bits reprensentations. But this is impossible on today's architectures because the 2008 version of the IEEE 754 standard is not supported yet.

If we agree that float128 is a bad name for something that isn't IEEE binary128, and there is already a longdouble type (thanks for pointing that out), then what about: Deprecating float128 / float96 as names Preferring longdouble for cross-platform == fairly big float of some sort Specific names according to format (float80 etc) ? See you, Matthew

Nathaniel Smith

10:16 p.m.

On Sun, Oct 16, 2011 at 3:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

If we agree that float128 is a bad name for something that isn't IEEE binary128, and there is already a longdouble type (thanks for pointing that out), then what about:

Deprecating float128 / float96 as names Preferring longdouble for cross-platform == fairly big float of some sort

+1 I understand the argument that you don't want to call it "float80" because not all machines support a float80 type. But I don't understand why we would solve that problem by making up two *more* names (float96, float128) that describe types that *no* machines actually support... this is incredibly confusing. I guess the question is, how do we deprecate a top-level name inside the np namespace?

...

Specific names according to format (float80 etc) ?

This part doesn't even seem necessary right now -- we could always add it later if machines start supporting multiple >64-bit float types at once, and in the mean time it doesn't add much. You can always use finfo if you're curious what longdouble means locally? -- Nathaniel

Charles R Harris

11:29 p.m.

On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

On Sun, Oct 16, 2011 at 3:04 PM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
If we agree that float128 is a bad name for something that isn't IEEE binary128, and there is already a longdouble type (thanks for pointing that out), then what about:

Deprecating float128 / float96 as names Preferring longdouble for cross-platform == fairly big float of some sort

+1

I understand the argument that you don't want to call it "float80" because not all machines support a float80 type. But I don't understand why we would solve that problem by making up two *more* names (float96, float128) that describe types that *no* machines actually support... this is incredibly confusing.

Well, float128 and float96 aren't interchangeable across architectures because of the different alignments, C long double isn't portable either, and float80 doesn't seem to be available anywhere. What concerns me is the difference between extended and quad precision, both of which can occupy 128 bits. I've complained about that for several years now, but as to extended precision, just don't use it. It will never be portable. Chuck

Nathaniel Smith

12:13 a.m.

On Sun, Oct 16, 2011 at 4:29 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
I understand the argument that you don't want to call it "float80" because not all machines support a float80 type. But I don't understand why we would solve that problem by making up two *more* names (float96, float128) that describe types that *no* machines actually support... this is incredibly confusing.

Well, float128 and float96 aren't interchangeable across architectures because of the different alignments, C long double isn't portable either, and float80 doesn't seem to be available anywhere. What concerns me is the difference between extended and quad precision, both of which can occupy 128 bits. I've complained about that for several years now, but as to extended precision, just don't use it. It will never be portable.

I think part of the confusion here is about when a type is named like 'float<N>', does 'N' refer to the size of the data or to the minimum alignment? I have a strong intuition that it should be the former, and I assume Matthew does too. If we have a data structure like struct { uint8_t flags; void * data; } then 'flags' will actually get 32 or 64 bits of space... but we would never, ever refer to it as a uint32 or a uint64! I know these extended precision types are even weirder because the compiler will insert that padding unconditionally, but the intuition still stands, and obviously some proportion of the userbase will share it. If our API makes smart people like Matthew spend a week going around in circles, then our API is dangerously broken! The solution is just to call it 'longdouble', which clearly communicates 'this does some quirky thing that depends on your C compiler and architecture'. -- Nathaniel

Charles R Harris

1:22 a.m.

On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

...
On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
I understand the argument that you don't want to call it "float80" because not all machines support a float80 type. But I don't understand why we would solve that problem by making up two *more* names (float96, float128) that describe types that *no* machines actually support... this is incredibly confusing.

Well, float128 and float96 aren't interchangeable across architectures because of the different alignments, C long double isn't portable either, and float80 doesn't seem to be available anywhere. What concerns me is

On Sun, Oct 16, 2011 at 4:29 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: the

...
difference between extended and quad precision, both of which can occupy 128 bits. I've complained about that for several years now, but as to extended precision, just don't use it. It will never be portable.

I think part of the confusion here is about when a type is named like 'float<N>', does 'N' refer to the size of the data or to the minimum alignment? I have a strong intuition that it should be the former, and I assume Matthew does too. If we have a data structure like struct { uint8_t flags; void * data; }

We need both in theory, in practice floats and doubles are pretty well defined these days, but long doubles depend on architecture and compiler for alignment, and even for representation in the case of PPC. I don't regard these facts as obscure if one is familiar with floating point, but most folks aren't and I agree that it can be misleading if one assumes that types and storage space are strongly coupled. This also ties in to the problem with ints and longs, which may both be int32 despite having different C names.

...

then 'flags' will actually get 32 or 64 bits of space... but we would never, ever refer to it as a uint32 or a uint64! I know these extended precision types are even weirder because the compiler will insert that padding unconditionally, but the intuition still stands, and obviously some proportion of the userbase will share it.

If our API makes smart people like Matthew spend a week going around in circles, then our API is dangerously broken!

I think "dangerously" is a bit overly dramatic.

...

The solution is just to call it 'longdouble', which clearly communicates 'this does some quirky thing that depends on your C compiler and architecture'.

Well, I don't know. If someone is unfamiliar with floats I would expect they would post a complaint about bugs if a file of longdouble type written on a 32 bit system couldn't be read on a 64 bit system. It might be better to somehow combine both the ieee type and the storage alignment. Chuck

Dag Sverre Seljebotn

8:20 a.m.

On 10/17/2011 03:22 AM, Charles R Harris wrote:

...

On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:

The solution is just to call it 'longdouble', which clearly communicates 'this does some quirky thing that depends on your C compiler and architecture'.

Well, I don't know. If someone is unfamiliar with floats I would expect they would post a complaint about bugs if a file of longdouble type written on a 32 bit system couldn't be read on a 64 bit system. It might be better to somehow combine both the ieee type and the storage alignment.

np.float80_96 np.float80_128 ? Dag Sverre

Charles R Harris

12:59 p.m.

On Mon, Oct 17, 2011 at 2:20 AM, Dag Sverre Seljebotn < d.s.seljebotn@astro.uio.no> wrote:

...

On 10/17/2011 03:22 AM, Charles R Harris wrote:

...
On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:

The solution is just to call it 'longdouble', which clearly communicates 'this does some quirky thing that depends on your C compiler and architecture'.

Well, I don't know. If someone is unfamiliar with floats I would expect they would post a complaint about bugs if a file of longdouble type written on a 32 bit system couldn't be read on a 64 bit system. It might be better to somehow combine both the ieee type and the storage

alignment.

np.float80_96 np.float80_128

Heh, my thoughts too. Chuck

Matthew Brett

6:39 p.m.

Hi, On Sun, Oct 16, 2011 at 6:22 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
On Sun, Oct 16, 2011 at 4:29 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
I understand the argument that you don't want to call it "float80" because not all machines support a float80 type. But I don't understand why we would solve that problem by making up two *more* names (float96, float128) that describe types that *no* machines actually support... this is incredibly confusing.

Well, float128 and float96 aren't interchangeable across architectures because of the different alignments, C long double isn't portable either, and float80 doesn't seem to be available anywhere. What concerns me is the difference between extended and quad precision, both of which can occupy 128 bits. I've complained about that for several years now, but as to extended precision, just don't use it. It will never be portable.

I think part of the confusion here is about when a type is named like 'float<N>', does 'N' refer to the size of the data or to the minimum alignment? I have a strong intuition that it should be the former, and I assume Matthew does too. If we have a data structure like struct { uint8_t flags; void * data; }

We need both in theory, in practice floats and doubles are pretty well defined these days, but long doubles depend on architecture and compiler for alignment, and even for representation in the case of PPC. I don't regard these facts as obscure if one is familiar with floating point, but most folks aren't and I agree that it can be misleading if one assumes that types and storage space are strongly coupled. This also ties in to the problem with ints and longs, which may both be int32 despite having different C names.

...
then 'flags' will actually get 32 or 64 bits of space... but we would never, ever refer to it as a uint32 or a uint64! I know these extended precision types are even weirder because the compiler will insert that padding unconditionally, but the intuition still stands, and obviously some proportion of the userbase will share it.

If our API makes smart people like Matthew spend a week going around in circles, then our API is dangerously broken!

I think "dangerously" is a bit overly dramatic.

...
The solution is just to call it 'longdouble', which clearly communicates 'this does some quirky thing that depends on your C compiler and architecture'.

Well, I don't know. If someone is unfamiliar with floats I would expect they would post a complaint about bugs if a file of longdouble type written on a 32 bit system couldn't be read on a 64 bit system. It might be better to somehow combine both the ieee type and the storage alignment.

David was pointing out that e.g. np.float128 could be a different thing on SPARC, PPC and Intel, so it seems to me that the float128 is a false friend if we think it at all likely that people will use platforms other than Intel. Personally, if I saw 'longdouble' as a datatype, it would not surprise me if it wasn't portable across platforms, including 32 and 64 bit. float80_96 and float80_128 seem fine to me, but it would also be good to suggest longdouble as the default name to use for the platform-specific higher-precision datatype to make code portable across platforms. See you, Matthew

4895

Age (days ago)

4897

Last active (days ago)

List overview

Download

16 comments

6 participants

participants (6)

Charles R Harris
Dag Sverre Seljebotn
David Cournapeau
Matthew Brett
Nadav Horesh
Nathaniel Smith

float128 in fact float80

tags

participants (6)