Re: [Numpy-discussion] Questions about the array interface.

--- konrad.hinsen@laposte.net wrote:
This discussion has been coming up regularly for a few years. Until now the concensus has always been that Python should make no assumptions that go beyond what a C compiler can promise. Which means no assumptions about floating-point representation.
Of course the computing world is changing, and IEEE format may well be ubiquitous by now. Vaxes must be in the museum by now. But how about mainframes? IBM mainframes didn't use IEEE when I used them (last time 15 years ago), and they are still around, possibly compatible to their ancestors.
I've been following this mailing list for a few years now, but I skip a lot of threads. I almost certainly skipped this topic in the past since it wasn't relevant to me. I'm only interested in it now since it's relevant to this data interchange business, so I'm sorry if this is a rehash... Trying to stay portable is a good goal, and I can understand why Python proper would try to adhere to the restrictions it does. Despite the claim, Python makes plenty of assumptions that a standards conformant C compiler could break. If numpy doesn't make some assumptions about floating point representation, it's going to kill the possibity of passing data across machines, and that's pretty unacceptable. I'm not comfortable saying "ubiquitous" since I don't know what the mainframe or super computing community is making use of, and I don't know what sort of little machines Python is running on. The closest thing to a mainframe that I've ever used was a Convex, and I never knew what it's floating point representation was. However, I know that x86, PPC, AMD-64, IA64, Alpha, Sparc, and whatever HPUX and SGIs are running on all use IEEE-754 format. That's probably 99.999% of all machines capable of running Python, and at least that percentage of users. It would be a shame to gum up this typecode thing for situations that don't occur in practice. If it has to be done, then I recommend we use the '@' code in place of the '<' or '>' for platforms that are out of the ordinary. It's important to specify that '@' is only to be used on floating point data that is not IEEE-754. In this case it doesn't mean "native" like it does in the struct module, it means "weird" :-).
Another detail to consider is that although most machines use the IEEE representation, hardly any respects the IEEE rules for floating point operations in all detail. In particular, trusting that Inf and NaN will be treated as IEEE postulates is a risky business.
See that's the thing. Why burden how you label the data with the restrictions of the current machine? You can take the data off the machine. Whether or not I can rely on what NaN*Inf will give me, I know that I can take NaN and Inf to another machine and get the same interpretation of the data. This whole thread started because Andrew Straw showed that struct.pack('<d',nan) causes an exception, but that's just a limitation in the struct module. He was definitely running it on a machine that was capable of representing an 8 byte little-endian NaN. He doesn't need a new typecode until he tries to transport data from some esoteric mainframe. Cheers, -Scott
participants (1)
-
Scott Gilbert