[Python-ideas] Fixed point format for numbers with locale based separators
Eric V. Smith
eric at trueblade.com
Sat Jan 5 19:48:27 EST 2019
On 1/5/2019 3:03 PM, Łukasz Stelmach wrote:
> Barry Scott <barry at barrys-emacs.org> writes:
>> On Friday, 4 January 2019 14:57:53 GMT Łukasz Stelmach wrote:
>>>
>>> I would like to present two pull requests[1][2] implementing fixed point
>>> presentation of numbers and ask for comments. The first is mine. I
>>> learnt about the second after publishing mine.
>>>
>>> The only format using decimal separator from locale data for
>>> float/complex/decimal numbers at the moment is "n" which behaves like
>>> "g". The drawback of these formats, I would like to overcome, is the
>>> inability to print numbers ranging more than one order of magnitude with
>>> the same number of decimal digits without "manually" (with some additional
>>> custom code) adjusting precission. The other option is to "manually"
>>> replace "." as printed by "f" with a local decimal separator. Neither of
>>> these option is appealing to my.
>>>
>>> Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8)
>>>
>>> | n | ".2f" | ".3n" |
>>> |---+----------+----------|
>>> | 1 | 1.23 | 1,23 |
>>> | 2 | 12.35 | 12,3 |
>>> | 3 | 123.46 | 123 |
>>> | 4 | 1234.57 | 1,23e+03 |
>>
>> Can you use locale.format_string() to solve this?
>
> I am afraid I can't. I am using a library called pint[1] in my
> project. It allows me to choose how its objects are formated but it uses
> format() internally. It adds some custom extensions to format strings
> which, as far as I can tell, mekes it hard if not impossible to patch it
> to locale.format_string(). But this is rather an excuse.
I do think that this is a compelling use case for "f" style locale-aware
formatting. I support adding it in some format or another (pun intended).
My only concern is how to paint the bike shed. Should we just use
another format spec "type" character instead of "f", as the two linked
issues propose? Or maybe use an additional "alternate form" style
character, so that we could use different locale options, either now or
in the future? https://bugs.python.org/issue33731 is similar to
https://bugs.python.org/issue34311 but proposes using LC_MONETARY
instead of LC_NUMERIC.
I'm not suggesting we solve every possible problem here, but we at least
shouldn't paint ourselves into a corner and instead allow a future where
we could expand things, if needed, and without using up tons of format
spec "type" characters for every permutation of "type" plus LC_MONETARY
or LC_NUMERIC.
Here's a straw man:
The current specification for the format spec is:
[[fill]align][sign][#][0][width][grouping_option][.precision][type]
Let's say we change it to:
[[fill]align][sign][#][*|$][0][width][grouping_option][.precision][type]
(I think that's unambiguous, but I'd have to think it through some more)
Let's call the new [*|$] character the "locale character".
If the locale character is "*", use locale-aware formatting for the
given "type", with LC_NUMERIC. So, "*g" would be equivalent to the
existing "n", and "*f" would give you the current "f" formatting, except
using LC_NUMERIC for the decimal point. If the locale character is "$"
use locale-aware LC_MONETARY. So then we could use "$g", "$f", etc.
These locale characters would also work with int, so "*d" would make "n"
obsolete (but I'm not proposing to remove it).
These should also work with these "type" values for floats: '%', 'f',
'F', 'g', 'G', 'e', 'E', and None (as defined in the docs to mean a
missing "type", not a real None value).
I don't know if there are any cases where '#' alternate form would be
used with '*' or '$'. If not, then maybe we could make the format spec
be the slightly simpler:
[[fill]align][sign][#|*|$][0][width][grouping_option][.precision][type]
But it's probably worth keeping '#' orthogonal to the locale character.
Maybe someday we'll want to use them together.
The locale character should be supported in the numeric types that
support the default format spec mini-language: int, float, decimal, and
complex, at least. I'd have to grep for others.
I think that for format spec "type" values where it doesn't make sense,
using these new locale characters would raise ValueError. For example,
since "b" output can never be locale-aware, "*b" would be an error, much
like ",b" is currently an error.
I'm not married to '*' for LC_NUMERIC, although I think '$' makes sense
for LC_MONETARY.
Again, this is just a straw man proposal that would require fleshing
out. I think it might also require a PEP, but it would be as simple as
PEP 378 for adding comma grouping formatting. Somewhere to memorialize
the decision and how we got there, including rejected alternate
proposals, would be a good thing.
Eric
More information about the Python-ideas
mailing list