Hello, There appears to be extremely minimal documentation on how floats are formatted on output. All I really see is that float.__str__() is float.__repr__(). So that means that float->str->float does not result in a different value. It would be nice if the output format for float was documented, to the extent this is possible. #python suggested that I propose a patch, but I see no way to write a documentation patch without having any clue about what Python promises, whether in the CPython implementation or as part of a specification for Python. What are the promises Python makes about the str() of a float? Will it produce 1.0 today and 1.0e0 or +1.0 tomorrow? When is the result in exponential notation and when not? Does any of this depend on the underlying OS or hardware or Python implementation? Etc. I'm guessing that Python is consistent with an IEEE 754 "external character sequence", but don't know what the IEEE specification says or whether python conforms. I don't really care whether there's documentation for __str__() or __repr__() or something else. I'm just thinking that there should be some way to guarantee a well defined "useful" float output formatting. By "useful" I mean in exponential notation when non-exponential notation is over-long. I am writing a program that sometimes prints python floats and want to be able to document what is printed. Right now I can't truly guarantee anything, other than the nan and inf and -inf representations. (I feel comfortable with nan and the like because I don't see it likely that their representations will change.) Of course I could always re-implement Python's float.__repr__() in Python so as to have full control, but this should be pointless. Python's output representation is unlikely to change and Python should be able to make sufficient promises about its existing float representation. I suppose there are similar issues with integers, but the varieties of floating point number implementations and the existence of both exponential and non-exponential representations make float particularly problematic and representations potentially mercurial. I also don't know if documentation changes with regard to external representations would require a PEP. I have found the following related information: Use shorter float repr when possible https://bugs.python.org/issue1580 https://github.com/python/cpython/blob/master/Python/pystrtod.c#L831 String conversion and formatting https://docs.python.org/3/c-api/conversion.html sys.float_repr_style https://docs.python.org/3/library/sys.html#sys.float_repr_style object.__str__(self) https://docs.python.org/3/reference/datamodel.html#object.__str__ At the end of the day I don't _really_ care. But having put thought into the matter I care enough to write this email and ask the question. Regards, Karl <kop@karlpinc.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
On 1/20/2020 10:59 PM, Karl O. Pinc wrote:
Hello,
There appears to be extremely minimal documentation on how floats are formatted on output. All I really see is that float.__str__() is float.__repr__(). So that means that float->str->float does not result in a different value.
It would be nice if the output format for float was documented, to the extent this is possible. #python suggested that I propose a patch, but I see no way to write a documentation patch without having any clue about what Python promises, whether in the CPython implementation or as part of a specification for Python.
What are the promises Python makes about the str() of a float? Will it produce 1.0 today and 1.0e0 or +1.0 tomorrow? When is the result in exponential notation and when not? Does any of this depend on the underlying OS or hardware or Python implementation? Etc. For what it's worth, float's repr internally uses a format of '.17g'. So, format(value, '.17g') will be equal to repr(f), where f is any float.
I think (but don't exactly recall, it's been a while) that you'll get different values if sys.float_repr_style is 'short' or not. I don't know if any current systems don't support 'short'. I don't know if this is documented. I'm also not sure if this is considered a CPython implementation detail or not, but I would argue that it is. Eric
21.01.20 10:37, Eric V. Smith пише:
For what it's worth, float's repr internally uses a format of '.17g'. So, format(value, '.17g') will be equal to repr(f), where f is any float.
It was in Python 2, but since Python 3.1 it returns the shortest unambiguous representation, which may be shorter than 17 digits. https://docs.python.org/3/whatsnew/3.1.html#other-language-changes https://bugs.python.org/issue1580
On 1/21/2020 4:32 AM, Serhiy Storchaka wrote:
21.01.20 10:37, Eric V. Smith пише:
For what it's worth, float's repr internally uses a format of '.17g'. So, format(value, '.17g') will be equal to repr(f), where f is any float.
It was in Python 2, but since Python 3.1 it returns the shortest unambiguous representation, which may be shorter than 17 digits.
https://docs.python.org/3/whatsnew/3.1.html#other-language-changes https://bugs.python.org/issue1580
Yes (I wrote a lot of that), but '.17g' doesn't mean to always show 17 digits. See https://github.com/python/cpython/blob/master/Python/pystrtod.c#L825 where the repr (which is format_code =='r') is translated to format_code = 'g' and precision = 17. But I was wrong about them being equivalent: 'g' will drop the trailing '.0' if it exists, and repr() will not (via flags = Py_DTSF_ADD_DOT_0). And I'm almost positive that 2.7 also uses short float repr, but doesn't have str == repr for floats. But 2.7 is dead to me (despite that fact that I use it extensively for one client!), so I'm not going to bother double checking. Eric
21.01.20 12:37, Eric V. Smith пише:
Yes (I wrote a lot of that), but '.17g' doesn't mean to always show 17 digits. See https://github.com/python/cpython/blob/master/Python/pystrtod.c#L825 where the repr (which is format_code =='r') is translated to format_code = 'g' and precision = 17.
But I was wrong about them being equivalent: 'g' will drop the trailing '.0' if it exists, and repr() will not (via flags = Py_DTSF_ADD_DOT_0).
This is not the only difference between '.17g' and repr().
'%.17g' % 1.23456789 '1.2345678899999999' format(1.23456789, '.17g') '1.2345678899999999' repr(1.23456789) '1.23456789'
On 1/21/2020 2:02 PM, Serhiy Storchaka wrote:
21.01.20 12:37, Eric V. Smith пише:
Yes (I wrote a lot of that), but '.17g' doesn't mean to always show 17 digits. See https://github.com/python/cpython/blob/master/Python/pystrtod.c#L825 where the repr (which is format_code =='r') is translated to format_code = 'g' and precision = 17.
But I was wrong about them being equivalent: 'g' will drop the trailing '.0' if it exists, and repr() will not (via flags = Py_DTSF_ADD_DOT_0).
This is not the only difference between '.17g' and repr().
'%.17g' % 1.23456789 '1.2345678899999999' format(1.23456789, '.17g') '1.2345678899999999' repr(1.23456789) '1.23456789'
Huh. That's interesting. Thanks! Eric
[Serhiy Storchaka]
This is not the only difference between '.17g' and repr().
'%.17g' % 1.23456789 '1.2345678899999999' format(1.23456789, '.17g') '1.2345678899999999' repr(1.23456789) '1.23456789'
More amazingly ;-), repr() isn't even always the same as a %g format specifying exactly the same number of digits as repr() produces. That's because repr(x) currently returns the shortest string such that eval(repr(x)) == x, but %g rounds correctly to the given number of digits. Not always the same thing!
x = 2.0 ** 89 print(repr(x)) 6.189700196426902e+26 print("%.16g" % x) # repr produced 16 digits 6.189700196426901e+26
The repr() output is NOT correctly rounded. To see which one is correctly rounded, here's an easy way:
import decimal decimal.Decimal(x) Decimal('618970019642690137449562112')
The "37449562112" is rounded off, and is less than half a unit in the last place, so correct rounding truncates the last digit to 1. But there is no string with 16 digits other than the incorrectly rounded one repr() returns that gives x back. In particular, the correctly rounded 16 digit string does not:
6.189700196426901e+26 # 16-digit correctly rounded fails 6.189700196426901e+26 x == _ False
To my mind it's idiotic(*) that "shortest string" requires incorrect rounding in some cases. In Python's history, eval(repr(x)) == x is something that was always intended, so long as the writing and reading was done by the same Python instance on the same machine. Maybe it's time to document that ;-) But CPython goes far beyond that now, also supplying correct rounding, _except_ for repr's output, where - for reasons already illustrated - "correct rounding" and "shortest" can't always both be satisfied. (*) What wouldn't be idiotic? For repr(x) to return the shortest _correctly rounded_ string such that eval(repr(x)) == x. In the example, that would require repr(x) to produce a 17-digit output (and 17 is the most that's ever needed for a Python float on boxes with IEEE doubles). But "shortest string" was taken ultra literally by the people who first worked out routines capable of doing that, so has become a de facto standard now.
On Mon, Jan 20, 2020 at 09:59:07PM -0600, Karl O. Pinc wrote:
It would be nice if the output format for float was documented, to the extent this is possible.
I don't think we should make any promises about the repr() of floats. We've already changed the format at least twice: - once to switch to the shortest unambiguous representation; - and once to shift to a more consistent output for NANs. (NANs on Windows prior to 2.6 used to be displayed as '1.#IND', if I recall correctly.) We may never want to change output format again, but if we document a certain format that will be read by people as a guarantee, and that closes the door to any change without a long and tedious deprecation period. If anyone wants a guaranteed output format for floats, they ought to use the various string formatting operations, which offer guaranteed formatting outputs. Or build your own formatter. I think that the most we should promise is that (with the exception of NANs) float -> repr -> float should round-trip with no change in value.
I don't really care whether there's documentation for __str__() or __repr__() or something else. I'm just thinking that there should be some way to guarantee a well defined "useful" float output formatting.
https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatti... https://docs.python.org/3/library/string.html#format-string-syntax -- Steven
On Tue, 21 Jan 2020 21:09:57 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Jan 20, 2020 at 09:59:07PM -0600, Karl O. Pinc wrote:
It would be nice if the output format for float was documented, to the extent this is possible.
I don't think we should make any promises about the repr() of floats. We've already changed the format at least twice:
- once to switch to the shortest unambiguous representation; - and once to shift to a more consistent output for NANs.
(NANs on Windows prior to 2.6 used to be displayed as '1.#IND', if I recall correctly.)
We may never want to change output format again, but if we document a certain format that will be read by people as a guarantee, and that closes the door to any change without a long and tedious deprecation period.
Understood. But you still might want to document, or even define in the language, that you're outputting the shortest unambiguous representation. Or other such broad principals like IEEE 754 representation compatibility. This is a suggestion, I don't want to advocate.
If anyone wants a guaranteed output format for floats, they ought to use the various string formatting operations, which offer guaranteed formatting outputs. Or build your own formatter.
I think that the most we should promise is that (with the exception of NANs) float -> repr -> float should round-trip with no change in value.
That would be nice, and is the sort of general principal I'm thinking of. Another one might be "a sign is only printed for negative numbers". I guess I will advocate for _some_ specification built into Python's definition. Otherwise everybody should _always_ build their own formatter; lest they wake up one morning and find that int zero prints as "+0". As mentioned, parts of this discussion could also apply to other numeric types.
I don't really care whether there's documentation for __str__() or __repr__() or something else. I'm just thinking that there should be some way to guarantee a well defined "useful" float output formatting.
https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatti...
https://docs.python.org/3/library/string.html#format-string-syntax
Thanks. For some reason nobody in #python pointed me to the 'g' format type. That resolves my issue. Unfortunately, because 'g' can strip the trailing ".0" floats formatted with it no longer satisfy the float->str->float immutability property. I can always: out = f'{num:g}' print(out if 'e' in out or '.' in out else f'{out}.0') sort of logic. (With handling for INF and NAN.) A cleaner format would be nice but this works. (The #g format leaves multiple trailing zeros, which is too different from the "minimal" form __repr__() produces.) FYI. It wouldn't hurt to have the PyOS_double_to_string() docs https://docs.python.org/3/c-api/conversion.html point out that "format" uses the codes as defined in your formatting links above. Digging around got me to PyOS_double_to_string() whereupon I was left in the dark about the meaning of the "format" codes. Thanks you all for the help. Regards, Karl <kop@karlpinc.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
On Tue, Jan 21, 2020 at 09:01:29AM -0600, Karl O. Pinc wrote:
Understood. But you still might want to document, or even define in the language, that you're outputting the shortest unambiguous representation.
I'm not even sure I would want to do that. That would make it a language guarantee and force all implementations to follow. Jython and IronPython may prefer to follow the repr used by Java and .Net; if not those two implementations, other implementations might want to do something similar.
I guess I will advocate for _some_ specification built into Python's definition. Otherwise everybody should _always_ build their own formatter; lest they wake up one morning and find that int zero prints as "+0".
We're not talking about ints, we're talking about floats. There's only one reasonable way to print ints that everyone expects, and that doesn't including putting a spurious sign on zero. As far as I know, ints print the same in just about every single programming language that uses base ten Arabic-Hindu digits 0...9. It's kind of a universal.
I don't really care whether there's documentation for __str__() or __repr__() or something else. I'm just thinking that there should be some way to guarantee a well defined "useful" float output formatting.
https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatti...
https://docs.python.org/3/library/string.html#format-string-syntax
Thanks. For some reason nobody in #python pointed me to the 'g' format type. That resolves my issue.
Unfortunately, because 'g' can strip the trailing ".0" floats formatted with it no longer satisfy the float->str->float immutability property.
I don't see why. Any string you get back from %g ought to convert back to a float without loss of precision, the trailing '.0' should not affect it. Can you give an example where it does? It seems to work for me. py> x = 94.0 py> float('%g' % x) == x True Do you care about having the shortest representation, or consistent representation? If you want a consistent representation, then I understand that %.17e is guaranteed to round-trip exactly for all floats (C doubles): py> '%.17e' % 94.0 '9.40000000000000000e+01' If you care about length, "94" is shorter than "94.0" and it still losslessly converts back to the float 94.0: py> '%.17g' % 94.0 '94' repr() will (I think) round trip, but it won't necessarily be the shortest, and it won't be consistent. -- Steven
On 1/21/2020 11:52 AM, Steven D'Aprano wrote:
I don't really care whether there's documentation for __str__() or __repr__() or something else. I'm just thinking that there should be some way to guarantee a well defined "useful" float output formatting. https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatti...
https://docs.python.org/3/library/string.html#format-string-syntax Thanks. For some reason nobody in #python pointed me to the 'g' format type. That resolves my issue.
Unfortunately, because 'g' can strip the trailing ".0" floats formatted with it no longer satisfy the float->str->float immutability property. I don't see why. Any string you get back from %g ought to convert back to a float without loss of precision, the trailing '.0' should not affect it. Can you give an example where it does?
It seems to work for me.
py> x = 94.0 py> float('%g' % x) == x True
The reason repr adds the '.0' that 'g' does not is to avoid this problem:
type(eval(repr(17.0))) == type(17.0) True type(eval(format(17.0, '.17g'))) == type(17.0) False
Eric
On Wed, Jan 22, 2020 at 4:03 AM Eric V. Smith <eric@trueblade.com> wrote:
The reason repr adds the '.0' that 'g' does not is to avoid this problem:
type(eval(repr(17.0))) == type(17.0) True type(eval(format(17.0, '.17g'))) == type(17.0) False
The OP wasn't asking about eval, though, but about float. If you're depending on the ability to eval the repr of a float, you also have to concern yourself with inf and nan, which are not builtin names. But I believe float(repr(x)) == x for any float x. ChrisA
On 1/21/2020 1:32 PM, Chris Angelico wrote:
On Wed, Jan 22, 2020 at 4:03 AM Eric V. Smith <eric@trueblade.com> wrote:
The reason repr adds the '.0' that 'g' does not is to avoid this problem:
type(eval(repr(17.0))) == type(17.0) True type(eval(format(17.0, '.17g'))) == type(17.0) False
The OP wasn't asking about eval, though, but about float. If you're depending on the ability to eval the repr of a float, you also have to concern yourself with inf and nan, which are not builtin names. But I believe float(repr(x)) == x for any float x.
None the less, it's why repr adds the '.0' that 'g' does not. Eric
On Tue, 21 Jan 2020 09:01:29 -0600 "Karl O. Pinc" <kop@karlpinc.com> wrote:
I guess I will advocate for _some_ specification built into Python's definition. Otherwise everybody should _always_ build their own formatter; lest they wake up one morning and find that int zero prints as "+0".
Having made a suggestion I've followed up with a pull request. https://github.com/python/cpython/pull/18111 I think I have come up with a very minimal and sane set of restrictions on the default Numeric string representations. Having done that, I'm less interested in spending a lot more time on this. I'd be happy to explain my wording choices, and equally happy to have the pull request immediately rejected. The pull request is presently failing the check for news. (I'm not entirely clear on how to satisfy the requirement, or whether I could come up with a good news entry. I'll wait to resolve this if it looks like the patch is going anywhere.) There should probably also be unit tests. But again, I'll wait to see if this is going anywhere. FYI, it was remarkably easy to build the docs. But the contribution process goes through an annoying number of corporations (github, the contributor signature...) and login steps. (The contributor signature needs to clear at your end.) Regards, Karl <kop@karlpinc.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
participants (6)
-
Chris Angelico
-
Eric V. Smith
-
Karl O. Pinc
-
Serhiy Storchaka
-
Steven D'Aprano
-
Tim Peters