[Numpy-discussion] Questions about the array interface.

Sat Apr 9 09:36:05 EDT 2005

--- konrad.hinsen at laposte.net wrote:
> 
> This discussion has been coming up regularly for a few years. Until now  
> the concensus has always been that Python should make no assumptions  
> that go beyond what a C compiler can promise. Which means no  
> assumptions about floating-point representation.
> 
> Of course the computing world is changing, and IEEE format may well be  
> ubiquitous by now. Vaxes must be in the museum by now. But how about  
> mainframes? IBM mainframes didn't use IEEE when I used them (last time  
> 15 years ago), and they are still around, possibly compatible to their  
> ancestors.
> 

I've been following this mailing list for a few years now, but I skip a lot
of threads.  I almost certainly skipped this topic in the past since it
wasn't relevant to me.  I'm only interested in it now since it's relevant
to this data interchange business, so I'm sorry if this is a rehash...

Trying to stay portable is a good goal, and I can understand why Python
proper would try to adhere to the restrictions it does.  Despite the claim,
Python makes plenty of assumptions that a standards conformant C compiler
could break.  If numpy doesn't make some assumptions about floating point
representation, it's going to kill the possibity of passing data across
machines, and that's pretty unacceptable.

I'm not comfortable saying "ubiquitous" since I don't know what the
mainframe or super computing community is making use of, and I don't know
what sort of little machines Python is running on.  The closest thing to a
mainframe that I've ever used was a Convex, and I never knew what it's
floating point representation was.  However, I know that x86, PPC, AMD-64,
IA64, Alpha, Sparc, and whatever HPUX and SGIs are running on all use
IEEE-754 format.  That's probably 99.999% of all machines capable of
running Python, and at least that percentage of users.

It would be a shame to gum up this typecode thing for situations that don't
occur in practice.  If it has to be done, then I recommend we use the '@'
code in place of the '<' or '>' for platforms that are out of the ordinary.
 It's important to specify that '@' is only to be used on floating point
data that is not IEEE-754.  In this case it doesn't mean "native" like it
does in the struct module, it means "weird" :-).

>
> Another detail to consider is that although most machines use the IEEE  
> representation, hardly any respects the IEEE rules for floating point  
> operations in all detail. In particular, trusting that Inf and NaN will  
> be treated as IEEE postulates is a risky business.
> 

See that's the thing.  Why burden how you label the data with the
restrictions of the current machine?  You can take the data off the
machine.  Whether or not I can rely on what NaN*Inf will give me, I know
that I can take NaN and Inf to another machine and get the same
interpretation of the data.

This whole thread started because Andrew Straw showed that
struct.pack('<d',nan) causes an exception, but that's just a limitation in
the struct module.  He was definitely running it on a machine that was
capable of representing an 8 byte little-endian NaN.  He doesn't need a new
typecode until he tries to transport data from some esoteric mainframe.

Cheers,
    -Scott