[Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
SourceForge.net
noreply@sourceforge.net
Fri, 28 Mar 2003 15:25:54 -0800
Patches item #532180, was opened at 2002-03-19 23:28
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470
Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Fredrik Lundh (effbot)
Summary: fix xmlrpclib float marshalling bug
Initial Comment:
As it stands now, xmlrpclib can send doubles, such as
1.#INF, that are not part of the XML-RPC standard.
This patch causes a ValueError to be raised instead.
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2003-03-29 00:25
Message:
Logged In: YES
user_id=21627
I'll conclude that it is a lot of tedious work for no
reason, and close this patch.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-21 00:55
Message:
Logged In: YES
user_id=31435
Python's internal format buffers are too small to use C %f
in its full generality, so you're suggesting something
there that's much harder to get done than you suspect.
Note that %f isn't a cureall anyway, as in either Python or
C, e.g., '%f' % 1e-10 throws away all information,
producing a string of zeroes. What you did is usually much
better than that.
Let's wait to hear what /F wants to do. If he's inclined
to take this part of the spec at face value, I can work
with him to write a "conforming" float->string that's
numerically sound. Else it's a lot of tedious work for no
reason.
----------------------------------------------------------------------
Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-21 00:24
Message:
Logged In: YES
user_id=108973
OK, this floating point stuff is over my head.
Is it OK that it loses accuracy?
- No
Is it OK that it produces 16 trailing zeroes for 1e-250?
- Yes
Is it OK that it raises OverflowError for the normal double
1e-300?
- No
Would exposing and using the C %f specifier, along with
repr, make for identical roundtrips?
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-20 23:53
Message:
Logged In: YES
user_id=31435
I don't use XML-RPC, so I'm assigning this to /F (it was
his code at the start, and he wants to keep it in synch
with his company's version).
Formatting floats is a difficult job if you pay attention
to accuracy. The original code had the property that
converting a Python float to an XML-RPC string, then back
to a float again, reproduced the original input exactly.
The code in the patch enjoys that property only by
accident; much of the time a roundtrip conversion using it
won't reproduce the number that was passed in. Is that
OK? There's no way to tell, since the XML-RPC spec has
scant idea what it's doing here, so leaves important
questions unanswered. OTOH, it seems to me that the
*point* of this porotocol is to transport values across
boxes, so of course it should move heaven and earth to
transport them faithfully.
Is it OK that it loses accuracy? Is it OK that it produces
16 trailing zeroes for 1e-250? Is it OK that it raises
OverflowError for the normal double 1e-300? No matter
what's asked, the spec has no answers.
----------------------------------------------------------------------
Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 21:48
Message:
Logged In: YES
user_id=108973
Ooops, I already wrote the converter (see new patch). I'm
not very concerned about sending 300 character strings for
large doubles, but I guess someone might be. I am concerned
about how large and ugly the code is.
XML-RPC is very poorly specified but the grammar for
doubles seems reasonably clear (silly, but clear).
If you don't like my double marshalling code, you could
please just checkin your infinity/NaN detection code (also
part of my patch)?
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-20 21:13
Message:
Logged In: YES
user_id=31435
If you think XML-RPC users are keen to see multi-hundred
character strings produced for ordinary doubles, Python
isn't going to be much help (you'll have to write your own
float -> string conversion); or if you think they're happy
to get an exception if they want to pass (e.g.) 1e20, you
can keep using repr() and complain because repr(1e20)
produces an exponent.
"decimal format" is simply two extremely common words
pasted together <+.9 wink>. I expect the Python docs here
ended up so vague because whoever wrote this part of the
docs didn't know the full story and didn't have time to
figure it out.
But I expect the same is true of the part of this spec
dealing with doubles (it doesn't define what it means
by "double-precision", and then goes on to say stuff that
doesn't make sense for what C or Java mean by double, or by
what IEEE-754 means by double precision -- it's off in its
own world, so if you take it at face value you'll have to
guess what the world is, and implement it yourself).
----------------------------------------------------------------------
Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 20:32
Message:
Logged In: YES
user_id=108973
I think that we should be flexible about the data that we
accept but rigorous about the data that we generate. So the
sign should always be send but not required.
"decimal format" appears in the Python documentation
(http://www.python.org/doc/current/lib/typesseq-
strings.html) so it is probably a documentation bug if the
meaning is not widely known.
I parsed it as "not exponential format".
My question was whether the %f Python format specifier
simply mapped to the C %f format specifier. But, based on
the output of a simple C program, that does not appear to
be the case.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-20 20:04
Message:
Logged In: YES
user_id=31435
Well, Brian, the spec clearly disallows 1.0 too -- if you
want to take that spec seriously, you can implement what it
says and we'll redirect the complaints to your personal
email account <wink>.
I can't parse your question about the C library (like, I
don't know what you mean by "decimal format").
----------------------------------------------------------------------
Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 19:57
Message:
Logged In: YES
user_id=108973
Whether it was intended or not, the spec clearly disallows
it.
I noticed the %f behavior too, which is interesting because
the Python docs say:
f Floating point decimal format
I wonder if it is the underlying C library refusing to
write large float values in decimal format.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-20 19:08
Message:
Logged In: YES
user_id=31435
Ack, I take part of that back: it's Python's
implementation of '%f' that can produce exponent notation.
There's no simple way to get the effect of C's %f from
Python. It's clear as mud whether "the spec" *intended* to
outlaw exponent notation.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-20 18:53
Message:
Logged In: YES
user_id=31435
"%f" can produce exponent notation too, which is also not
allowed by this pseudo-spec.
r = repr(some_double)
if 'n' in r or 'N' in r:
raise ValueError(...)
is robust, will work fine x-platform, and isn't insane
<wink>.
----------------------------------------------------------------------
Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 18:31
Message:
Logged In: YES
user_id=108973
Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling
and strtod for unmarshalling.
Let me design a more robust patch.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-20 17:23
Message:
Logged In: YES
user_id=31435
The spec appears worse than useless to me here -- whoever
wrote it just made stuff up. They don't appear to know
anything about floats or about grammar specification. Do
you really want to allow "+." and disallow "1.0"? This
seems a case where the spec is so braindead that nobody (in
their mind <wink>) will implement it as given. What do
other implementations do?
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 17:03
Message:
Logged In: YES
user_id=21627
You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says
# There is no representation for infinity or negative
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period
# and any number of numeric characters. Whitespace is not
# allowed. The range of allowable values is
# implementation-dependent, is not specified.
That would be best validated with a regular expression.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-03-20 16:02
Message:
Logged In: YES
user_id=31435
Note that the patch only catches "the problem" on a
platform whose C library can't read back its own float
output. Windows is in that class, but many other platforms
aren't.
It would be better to see whether 'n' or 'N' appear in the
repr() (that would catch variations of 'inf', 'INF', 'NaN'
and 'IND', while no "normal" float contains n).
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:28
Message:
Logged In: YES
user_id=21627
It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470