[Python-Dev] Re: marshal / unmarshal

Tim Peters tim.peters at gmail.com
Tue Apr 12 01:39:23 CEST 2005

[Michael Hudson]
> I've just submitted http://python.org/sf/1180995 which adds format
> codes for binary marshalling of floats if version > 1, but it doesn't
> quite have the effect I expected (see below):

> >>> inf = 1e308*1e308
> >>> nan = inf/inf
> >>> marshal.dumps(nan, 2)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in ?
> ValueError: unmarshallable object

I don't understand.  Does "binary marshalling" _not_ mean just copying
the bytes on a 754 platform?  If so, that won't work.  I pointed out
the relevant comments before:

/* The pack routines write 4 or 8 bytes, starting at p.
 * Bug:  What this does is undefined if x is a NaN or infinity.
 * Bug:  -0.0 and +0.0 produce the same string.
PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le);
PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le);

> frexp(nan, &e), it turns out, returns nan,

This is an undefined case in C89 (all 754 special values are).

> which results in this (to be expected if you read _PyFloat_Pack8 and
> know that I'm using a new-ish GCC -- it might be different for MSVC 6).
> Also (this is the same thing, really):

Right.  So is pickling with proto >= 1.  Changing the pack/unpack
routines to copy bytes instead (when possible) "fixes" all of these
things at one stroke, on boxes where it applies.
> >>> struct.pack('>d', inf)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in ?
> SystemError: frexp() result out of range
> Although I was a little surprised by this:
> >>> struct.pack('d', inf)
> '\x7f\xf0\x00\x00\x00\x00\x00\x00'
> (this is a big-endian system).  Again, reading the source explains the
> behaviour.

>>> OK, so the worst that could happen here is that moving marshal data
>>> from one box to another could turn one sort of NaN into another?

>> Right.  Assuming source and destination boxes both use 754 format, and
>> the implementation adjusts endianess if necessary.

> Well, I was assuming marshal would do floats little-endian-wise, as it
> does for integers.

Then on a big-endian 754 system, loads() will have to reverse the
bytes in the little-endian marshal bytestring, and dumps() likewise. 
That's all "if necessary" meant -- sometimes cast + memcpy isn't
enough, and regardless of which direction marshal decides to use.

>> Heh.  I have a vague half-memory of _some_ box that stored the two
>> 4-byte "words" in an IEEE double in one order, but the bytes within
>> each word in the opposite order.  It's always something ...

> I recall stories of machines that stored the bytes of long in some
> crazy order like that.  I think Python would already be broken on such
> a system, but, also, don't care.

Python does very little that depends on internal native byte order,
and C hides it in the absence of casting abuse.  Copying internal
native bytes across boxes is plain ugly -- can't get more brittle than
that.  In this case it looks like a good tradeoff, though.

> ...
> Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all
> that much (this is what I meant by false negatives below).
> ...
> It just strikes as silly to test at runtime sometime that is so
> obviously not going to change between invocations.  But it's not a big
> deal either way.

It isn't to me either.  It just strikes me as silly to give porters
another thing to wonder about and screw up when it's possible to solve
it completely with a few measly runtime cycles <wink>.

>>> Something along these lines:
>>> double x = 1.5;
>>> is_big_endian_ieee_double = sizeof(double) == 8 && \
>>>       memcmp((char*)&x, "\077\370\000\000\000\000\000\000", 8);

>> Right, it's that easy

> Cool.

>> -- at least under MSVC and gcc.
> Huh?  Now it's my turn to be confused (for starters, under MSVC ieee
> doubles really can be assumed...).

So you have no argument with the "at least under MSVC" part <wink>. 
There's nothing to worry about here -- I was just tweaking.

More information about the Python-Dev mailing list