Fixed point format for numbers with locale based separators
Hi, I would like to present two pull requests[1][2] implementing fixed point presentation of numbers and ask for comments. The first is mine. I learnt about the second after publishing mine. The only format using decimal separator from locale data for float/complex/decimal numbers at the moment is "n" which behaves like "g". The drawback of these formats, I would like to overcome, is the inability to print numbers ranging more than one order of magnitude with the same number of decimal digits without "manually" (with some additional custom code) adjusting precission. The other option is to "manually" replace "." as printed by "f" with a local decimal separator. Neither of these option is appealing to my. Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8) | n | ".2f" | ".3n" | |---+----------+----------| | 1 | 1.23 | 1,23 | | 2 | 12.35 | 12,3 | | 3 | 123.46 | 123 | | 4 | 1234.57 | 1,23e+03 | In the application I want to create I am going to present users numbers ranging up to 3 orders of magnitude and I (my users) want them to be presented consistently with regards to number of decimal digits AND I want to conform to rules of languages of my users. And I would like to avoid the exponent notation by all means. I can't say much about James Emerton's implementation or his intentions, but please take a look at our patches and give your comments so either of us or together we can implement this feature. PS. In theory both implementations could be merged because James chose to use "l" to use LC_MONETARY category and I chose "m" to use LC_NUMERIC. [1] https://github.com/python/cpython/pull/11405 [2] https://github.com/python/cpython/pull/8612 -- Miłego dnia, Łukasz Stelmach
On Fri, Jan 04, 2019 at 03:57:53PM +0100, Łukasz Stelmach wrote:
Hi,
I would like to present two pull requests[1][2] implementing fixed point presentation of numbers and ask for comments. The first is mine. I learnt about the second after publishing mine.
Before I look at the implementation, can you explain the functional requirements please? In other words, what is the new feature you hope to have excepted? Explain the intention and the API (the interface). The implementation is the least important part :-) [...]
Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8) | n | ".2f" | ".3n" | |---+----------+----------| | 1 | 1.23 | 1,23 | | 2 | 12.35 | 12,3 | | 3 | 123.46 | 123 | | 4 | 1234.57 | 1,23e+03 |
I'm afraid I cannot work out what that table means. You say "Formatting 1.23... * n" (multiplying by n) but the results shown aren't multiplied by n=2, n=3, n=4 as the table suggests. Can you show what Python code you expect will produce the expected output? Thank you. -- Steve
Steven D'Aprano <steve@pearwood.info> writes:
On Fri, Jan 04, 2019 at 03:57:53PM +0100, Łukasz Stelmach wrote:
Hi,
I would like to present two pull requests[1][2] implementing fixed point presentation of numbers and ask for comments. The first is mine. I learnt about the second after publishing mine.
Before I look at the implementation, can you explain the functional requirements please?
As I stated in the original message below the table:
In the application I want to create I am going to present users numbers ranging up to 3 orders of magnitude and I (my users) want them to be presented consistently with regards to number of decimal digits AND I want to conform to rules of languages of my users. And I would like to avoid the exponent notation by all means.
The pint[1] library I use, implements formatting of physical quantities using the format()/__format__ code. As far as I can tell my patch for Python is shorter and more straightforward than a patch for pint to use locale.format(). Because the "g" based "n" formatter has been present since the advanced string formatting was described in PEP-3101, I think it is necessary to add the "m" formatter based on "f". The advanced string formatting facility in Python is very convenient and programmers shouldn't forced to use locale.format() like this "The total length of {} sticks is {} meters.".format(n_sticks, locale.format(".2f", l_sticks)) instead of "The total length of {} sticks is {:.2f} meters.".format(n_sticks, l_sticks)
In other words, what is the new feature you hope to have excepted? Explain the intention and the API (the interface). The implementation is the least important part :-)
I wish to add a new formatter "m" for float/complex/decimal numbers, which behaves like the existing "f", but uses the decimal separator from the locale database. There is "n" formmatter which behaves like "g" but it does not fit my needs.
[...]
Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8) | n | ".2f" | ".3n" | |---+----------+----------| | 1 | 1.23 | 1,23 | | 2 | 12.35 | 12,3 | | 3 | 123.46 | 123 | | 4 | 1234.57 | 1,23e+03 |
I'm afraid I cannot work out what that table means. You say "Formatting 1.23... * n" (multiplying by n) but the results shown aren't multiplied by n=2, n=3, n=4 as the table suggests.
Can you show what Python code you expect will produce the expected output?
for n in range(1,5): print("| {} | {:8.2f} | {:8.3n} |".format(n,1.23456789 * 10**n, 1.23456789 * 10**n)) [1] http://pint.readthedocs.io/ -- Było mi bardzo miło. --- Rurku. --- ...
Łukasz< --- To dobrze, że mnie słuchasz.
On Friday, 4 January 2019 14:57:53 GMT Łukasz Stelmach wrote:
Hi,
I would like to present two pull requests[1][2] implementing fixed point presentation of numbers and ask for comments. The first is mine. I learnt about the second after publishing mine.
The only format using decimal separator from locale data for float/complex/decimal numbers at the moment is "n" which behaves like "g". The drawback of these formats, I would like to overcome, is the inability to print numbers ranging more than one order of magnitude with the same number of decimal digits without "manually" (with some additional custom code) adjusting precission. The other option is to "manually" replace "." as printed by "f" with a local decimal separator. Neither of these option is appealing to my.
Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8)
| n | ".2f" | ".3n" | | |---+----------+----------| | | 1 | 1.23 | 1,23 | | 2 | 12.35 | 12,3 | | 3 | 123.46 | 123 | | 4 | 1234.57 | 1,23e+03 |
Can you use locale.format_string() to solve this? I used this to test: import locale n = 1.23456789 for order in range(5): m = n * (10**order) for lang in ('en_GB.utf8', 'pl_PL.utf8'): locale.setlocale(locale.LC_ALL, lang) print( 'python %%.2f in %s: %.2f' % (lang, m) ) print( locale.format_string('locale %%.2f in %s: %.2f', (lang, m), grouping=True) ) print() Which outputs: python %.2f in en_GB.utf8: 1.23 locale %.2f in en_GB.utf8: 1.23 python %.2f in pl_PL.utf8: 1.23 locale %.2f in pl_PL.utf8: 1,23 python %.2f in en_GB.utf8: 12.35 locale %.2f in en_GB.utf8: 12.35 python %.2f in pl_PL.utf8: 12.35 locale %.2f in pl_PL.utf8: 12,35 python %.2f in en_GB.utf8: 123.46 locale %.2f in en_GB.utf8: 123.46 python %.2f in pl_PL.utf8: 123.46 locale %.2f in pl_PL.utf8: 123,46 python %.2f in en_GB.utf8: 1234.57 locale %.2f in en_GB.utf8: 1,234.57 python %.2f in pl_PL.utf8: 1234.57 locale %.2f in pl_PL.utf8: 1 234,57 python %.2f in en_GB.utf8: 12345.68 locale %.2f in en_GB.utf8: 12,345.68 python %.2f in pl_PL.utf8: 12345.68 locale %.2f in pl_PL.utf8: 12 345,68 Barry
Barry Scott <barry@barrys-emacs.org> writes:
On Friday, 4 January 2019 14:57:53 GMT Łukasz Stelmach wrote:
I would like to present two pull requests[1][2] implementing fixed point presentation of numbers and ask for comments. The first is mine. I learnt about the second after publishing mine.
The only format using decimal separator from locale data for float/complex/decimal numbers at the moment is "n" which behaves like "g". The drawback of these formats, I would like to overcome, is the inability to print numbers ranging more than one order of magnitude with the same number of decimal digits without "manually" (with some additional custom code) adjusting precission. The other option is to "manually" replace "." as printed by "f" with a local decimal separator. Neither of these option is appealing to my.
Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8)
| n | ".2f" | ".3n" | |---+----------+----------| | 1 | 1.23 | 1,23 | | 2 | 12.35 | 12,3 | | 3 | 123.46 | 123 | | 4 | 1234.57 | 1,23e+03 |
Can you use locale.format_string() to solve this?
I am afraid I can't. I am using a library called pint[1] in my project. It allows me to choose how its objects are formated but it uses format() internally. It adds some custom extensions to format strings which, as far as I can tell, mekes it hard if not impossible to patch it to locale.format_string(). But this is rather an excuse. I thnik, had this problem some time ago and I got away with locale.format_string() then, but honestly I think format()/string.format/__format__ shuld support locale aware "f" just like there is "n" that behaves like "g". [1] http://pint.readthedocs.io/ -- Było mi bardzo miło. --- Rurku. --- ...
Łukasz< --- To dobrze, że mnie słuchasz.
On 1/5/2019 3:03 PM, Łukasz Stelmach wrote:
Barry Scott <barry@barrys-emacs.org> writes:
On Friday, 4 January 2019 14:57:53 GMT Łukasz Stelmach wrote:
I would like to present two pull requests[1][2] implementing fixed point presentation of numbers and ask for comments. The first is mine. I learnt about the second after publishing mine.
The only format using decimal separator from locale data for float/complex/decimal numbers at the moment is "n" which behaves like "g". The drawback of these formats, I would like to overcome, is the inability to print numbers ranging more than one order of magnitude with the same number of decimal digits without "manually" (with some additional custom code) adjusting precission. The other option is to "manually" replace "." as printed by "f" with a local decimal separator. Neither of these option is appealing to my.
Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8)
| n | ".2f" | ".3n" | |---+----------+----------| | 1 | 1.23 | 1,23 | | 2 | 12.35 | 12,3 | | 3 | 123.46 | 123 | | 4 | 1234.57 | 1,23e+03 |
Can you use locale.format_string() to solve this?
I am afraid I can't. I am using a library called pint[1] in my project. It allows me to choose how its objects are formated but it uses format() internally. It adds some custom extensions to format strings which, as far as I can tell, mekes it hard if not impossible to patch it to locale.format_string(). But this is rather an excuse.
I do think that this is a compelling use case for "f" style locale-aware formatting. I support adding it in some format or another (pun intended). My only concern is how to paint the bike shed. Should we just use another format spec "type" character instead of "f", as the two linked issues propose? Or maybe use an additional "alternate form" style character, so that we could use different locale options, either now or in the future? https://bugs.python.org/issue33731 is similar to https://bugs.python.org/issue34311 but proposes using LC_MONETARY instead of LC_NUMERIC. I'm not suggesting we solve every possible problem here, but we at least shouldn't paint ourselves into a corner and instead allow a future where we could expand things, if needed, and without using up tons of format spec "type" characters for every permutation of "type" plus LC_MONETARY or LC_NUMERIC. Here's a straw man: The current specification for the format spec is: [[fill]align][sign][#][0][width][grouping_option][.precision][type] Let's say we change it to: [[fill]align][sign][#][*|$][0][width][grouping_option][.precision][type] (I think that's unambiguous, but I'd have to think it through some more) Let's call the new [*|$] character the "locale character". If the locale character is "*", use locale-aware formatting for the given "type", with LC_NUMERIC. So, "*g" would be equivalent to the existing "n", and "*f" would give you the current "f" formatting, except using LC_NUMERIC for the decimal point. If the locale character is "$" use locale-aware LC_MONETARY. So then we could use "$g", "$f", etc. These locale characters would also work with int, so "*d" would make "n" obsolete (but I'm not proposing to remove it). These should also work with these "type" values for floats: '%', 'f', 'F', 'g', 'G', 'e', 'E', and None (as defined in the docs to mean a missing "type", not a real None value). I don't know if there are any cases where '#' alternate form would be used with '*' or '$'. If not, then maybe we could make the format spec be the slightly simpler: [[fill]align][sign][#|*|$][0][width][grouping_option][.precision][type] But it's probably worth keeping '#' orthogonal to the locale character. Maybe someday we'll want to use them together. The locale character should be supported in the numeric types that support the default format spec mini-language: int, float, decimal, and complex, at least. I'd have to grep for others. I think that for format spec "type" values where it doesn't make sense, using these new locale characters would raise ValueError. For example, since "b" output can never be locale-aware, "*b" would be an error, much like ",b" is currently an error. I'm not married to '*' for LC_NUMERIC, although I think '$' makes sense for LC_MONETARY. Again, this is just a straw man proposal that would require fleshing out. I think it might also require a PEP, but it would be as simple as PEP 378 for adding comma grouping formatting. Somewhere to memorialize the decision and how we got there, including rejected alternate proposals, would be a good thing. Eric
Dnia 6 stycznia 2019 o 01:48 "Eric V. Smith" <eric@trueblade.com> napisał(a):
On 1/5/2019 3:03 PM, Łukasz Stelmach wrote:
Barry Scott <barry@barrys-emacs.org> writes:
On Friday, 4 January 2019 14:57:53 GMT Łukasz Stelmach wrote:
I would like to present two pull requests[1][2] implementing fixed point presentation of numbers and ask for comments. The first is mine. I learnt about the second after publishing mine.
The only format using decimal separator from locale data for float/complex/decimal numbers at the moment is "n" which behaves like "g". The drawback of these formats, I would like to overcome, is the inability to print numbers ranging more than one order of magnitude with the same number of decimal digits without "manually" (with some additional custom code) adjusting precission. The other option is to "manually" replace "." as printed by "f" with a local decimal separator. Neither of these option is appealing to my.
Formatting 1.23456789 * n (LC_ALL=3Dpl_PL.UTF-8)
| n | ".2f" | ".3n" | |---+----------+----------| | 1 | 1.23 | 1,23 | | 2 | 12.35 | 12,3 | | 3 | 123.46 | 123 | | 4 | 1234.57 | 1,23e+03 |
Can you use locale.format_string() to solve this?
I am afraid I can't. I am using a library called pint[1] in my project. It allows me to choose how its objects are formated but it uses format() internally. It adds some custom extensions to format strings which, as far as I can tell, mekes it hard if not impossible to patch it to locale.format_string(). But this is rather an excuse.
I do think that this is a compelling use case for "f" style locale-aware formatting. I support adding it in some format or another (pun intended).
My only concern is how to paint the bike shed. Should we just use another format spec "type" character instead of "f", as the two linked issues propose? Or maybe use an additional "alternate form" style character, so that we could use different locale options, either now or in the future? https://bugs.python.org/issue33731 is similar to https://bugs.python.org/issue34311 but proposes using LC_MONETARY instead of LC_NUMERIC.
I'm not suggesting we solve every possible problem here, but we at least shouldn't paint ourselves into a corner and instead allow a future where we could expand things, if needed, and without using up tons of format spec "type" characters for every permutation of "type" plus LC_MONETARY or LC_NUMERIC.
Here's a straw man:
The current specification for the format spec is: [[fill]align][sign][#][0][width][grouping_option][.precision][type]
Let's say we change it to: [[fill]align][sign][#][*|$][0][width][grouping_option][.precision][type]
(I think that's unambiguous, but I'd have to think it through some more)
Let's call the new [*|$] character the "locale character".
[...] OK, it doesn't sound bad at all and I wonder if there is *any* other situation that may allow/require choosing between different categories of locale data to format the same value. If so (I need to read some more about locale date), I think your idea can be extended even further. Let's use 'Lx' as even more general 'locale control sequence' where 'x' is a locale category in general (LC_CTYPE, LC_). Should we support only POSIX categories[1] or extensions like LC_PAPER in glibc or other OS/library too? BTW. Is there any scanf() equivalent in Python, that uses the same syntax as format()? Because it might benefit from such control sequences even more?
Again, this is just a straw man proposal that would require fleshing out. I think it might also require a PEP, but it would be as simple as PEP 378 for adding comma grouping formatting. Somewhere to memorialize the decision and how we got there, including rejected alternate proposals, would be a good thing.
Challenge accepted (-; Where do I start? [1] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html -- Kind regards, Łukasz Stelmach
Eric V. Smith wrote: ...
The current specification for the format spec is: [[fill]align][sign][#][0][width][grouping_option][.precision][type] Let's say we change it to: [[fill]align][sign][#][*|$][0][width][grouping_option][.precision][type] ... Let's call the new [*|$] character the "locale character".
+1
I'm not married to '*' for LC_NUMERIC, although I think '$' makes sense for LC_MONETARY.
If the "locale character" would be placed behind (or just infront of) the type, it would be possible to use '#' instead of '*' as modifier, which - for my taste - is a better fit for LC_NUMERIC . Syntax would be [[fill]align][sign][#][0][width][grouping_option][.precision][type][#|$] or [[fill]align][sign][#][0][width][grouping_option][.precision][#|$][type] Michael
[[fill]align][sign][#][0][width][grouping_option][.precision][#|$][type] Could not distinguish the first and second #: everything between them is optional. Eric On 12/20/2019 7:13 AM, Michael Amrhein wrote:
Eric V. Smith wrote: ...
The current specification for the format spec is: [[fill]align][sign][#][0][width][grouping_option][.precision][type] Let's say we change it to: [[fill]align][sign][#][*|$][0][width][grouping_option][.precision][type] ... Let's call the new [*|$] character the "locale character". +1
I'm not married to '*' for LC_NUMERIC, although I think '$' makes sense for LC_MONETARY. If the "locale character" would be placed behind (or just infront of) the type, it would be possible to use '#' instead of '*' as modifier, which - for my taste - is a better fit for LC_NUMERIC .
Syntax would be [[fill]align][sign][#][0][width][grouping_option][.precision][type][#|$] or [[fill]align][sign][#][0][width][grouping_option][.precision][#|$][type]
Michael _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/G7RBWD... Code of Conduct: http://python.org/psf/codeofconduct/
Yes, you are right. Would it be a way out, to bind the "locale character" to the type? [[fill]align][sign][#][0][width][grouping_option][.precision][type[#|$]]
participants (5)
-
Barry Scott
-
Eric V. Smith
-
Michael Amrhein
-
Steven D'Aprano
-
Łukasz Stelmach