Formatting mini-language suggestion
The current formatting mini-language provisions left/right/center alignment, prefixes for 0b 0x 0o, and rules on when to show the plus-sign. I think it would be far more useful to provision a simple way of specifying a thousands separator. Financial users in particular find the locale approach to be frustrating and non-obvious. Putting in a thousands separator is a common task for output destined to be read by non-programmers. Raymond
Raymond Hettinger <python <at> rcn.com> writes:
Financial users in particular find the locale approach to be frustrating and
non-obvious. Putting in a
thousands separator is a common task for output destined to be read by non-programmers.
Please note that for it to be useful in all parts of the world, it must also allow changing the decimal point.
On Wed, Mar 11, 2009 at 6:01 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Raymond Hettinger <python <at> rcn.com> writes:
Financial users in particular find the locale approach to be frustrating and
non-obvious. Putting in a
thousands separator is a common task for output destined to be read by non-programmers.
Please note that for it to be useful in all parts of the world, it must also allow changing the decimal point.
Now that this cat is out of the bag (or should I say now that this can of worms is opened :-) I suggest moving this to python-ideas and writing a proper PEP. I expect that nobody likes that idea, but it seems better than the alternative, which is to let the programmer who gets to implement it design it... -- --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido van Rossum]
I suggest moving this to python-ideas and writing a proper PEP.
Okay, it's moved. Will write up a PEP, do research on what other languages do and collect everyone's ideas on what to put in the shed. (hundreds and ten thousands grouping, various choices of decimal points, mayan number systems and whatnot). Will start with Nick's simple proposal as a starting point. [Nick Coghlan]
[[fill]align][sign][#][0][minimumwidth][,][.precision][type]
Other suggestions and comments welcome. Raymond
Raymond Hettinger wrote:
The current formatting mini-language provisions left/right/center alignment, prefixes for 0b 0x 0o, and rules on when to show the plus-sign. I think it would be far more useful to provision a simple way of specifying a thousands separator.
Financial users in particular find the locale approach to be frustrating and non-obvious. Putting in a thousands separator is a common task for output destined to be read by non-programmers.
+1 for the general idea. A specific syntax proposal: [[fill]align][sign][#][0][minimumwidth][,sep][.precision][type] 'sep' is the new field that defines the thousands separator. It appears immediately before the precision specifier and starts with a leading comma. I believe this syntax is unambiguous and backwards compatible because the only other place a comma might appear (the fill field) is required to be followed by an alignment character. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Mar 11, 2009, at 9:06 PM, Nick Coghlan wrote:
Raymond Hettinger wrote:
The current formatting mini-language provisions left/right/center alignment, prefixes for 0b 0x 0o, and rules on when to show the plus-sign. I think it would be far more useful to provision a simple way of specifying a thousands separator.
Financial users in particular find the locale approach to be frustrating and non-obvious. Putting in a thousands separator is a common task for output destined to be read by non-programmers.
+1 for the general idea.
A specific syntax proposal:
[[fill]align][sign][#][0][minimumwidth][,sep][.precision][type]
'sep' is the new field that defines the thousands separator. It appears immediately before the precision specifier and starts with a leading comma.
I believe this syntax is unambiguous and backwards compatible because the only other place a comma might appear (the fill field) is required to be followed by an alignment character.
You might be interested to know that in India, the commas don't come every 3 digits. In india, they come every two digits, after the first three. Thus one billion = 1,00,00,00,000. How are you gonna represent *that* in a formatting mini-language? :) See also http://en.wikipedia.org/wiki/Indian_numbering_system James
[James Y Knight]
You might be interested to know that in India, the commas don't come every 3 digits. In india, they come every two digits, after the first three. Thus one billion = 1,00,00,00,000. How are you gonna represent *that* in a formatting mini-language? :)
It is not the goal to replace locale or to accomodate every possible convention. The goal is to make a common task easier for many users. The current, default use of the period as a decimal point has not proven to be problem eventhough that convention is not universal. For a thousands separator, a comma is a decent choice that makes it easy follow-on with s.replace(',', '_') or somesuch. This simple utility could help a lot of programmers make their output look more professional and readable. I hope the idea doesn't get sunk by a desire to over-parameterize and cover every possible use case. My pocket calculators all support thousands separators but in Python, we have to do a funky dance for even this most basic bit of formatting. I'd like to think that in 2009 we could show a little progress beyond C's printf() or Fortran's write() formats. Raymond
import locale locale.setlocale(locale.LC_ALL, 'English_United States.1252') 'English_United States.1252' conv = locale.localeconv() # get a mapping of conventions x = 1234567.8 locale.format("%d", x, grouping=True) '1,234,567'
Raymond Hettinger wrote:
[James Y Knight]
You might be interested to know that in India, the commas don't come every 3 digits. In india, they come every two digits, after the first three. Thus one billion = 1,00,00,00,000. How are you gonna represent *that* in a formatting mini-language? :)
It is not the goal to replace locale or to accomodate every possible convention. The goal is to make a common task easier for many users. The current, default use of the period as a decimal point has not proven to be problem eventhough that convention is not universal. For a thousands separator, a comma is a decent choice that makes it easy follow-on with s.replace(',', '_') or somesuch.
In that case, I would simplify my suggestion to: [[fill]align][sign][#][0][minimumwidth][,][.precision][type] Addition to mini language documentation: The ',' option indicates that commas should be included in the output as a thousands separator. As with locales which do not use a period as the decimal point, locales which use a different convention for digit separation will need to use the locale module to obtain appropriate formatting. Guido has asked for a PEP to be developed on python-ideas to define the deliberately limited scope though, so I'm going to bow out of the conversation now... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Mar 11, 2009, at 11:40 PM, Nick Coghlan wrote:
Raymond Hettinger wrote:
It is not the goal to replace locale or to accomodate every possible convention. The goal is to make a common task easier for many users. The current, default use of the period as a decimal point has not proven to be problem eventhough that convention is not universal. For a thousands separator, a comma is a decent choice that makes it easy follow-on with s.replace(',', '_') or somesuch.
In that case, I would simplify my suggestion to:
[[fill]align][sign][#][0][minimumwidth][,][.precision][type]
Addition to mini language documentation: The ',' option indicates that commas should be included in the output as a thousands separator. As with locales which do not use a period as the decimal point, locales which use a different convention for digit separation will need to use the locale module to obtain appropriate formatting.
This proposal has the advantage that you're not overly specifying the behavior in the format string itself. That is: the "," option is really just indicating "please insert separators". With the current locale-ignorant implementation, that'd just mean "a comma every 3 digits". But it leaves the door open for a locale-sensitive variant of the format to be added in the future without conflicting with the instructions in the format string. (as the ability to specify an arbitrary character, or the ability to specify a comma instead of a period for the decimal point would). I'm not against Raymond's proposal, just against doing a *bad* job of making it work in multiple locales. Locale conventions can be complex, and are going to be best represented outside the format string. (BTW: single quote is used by printf for the grouping flag rather than comma) James
James Y Knight wrote:
On Mar 11, 2009, at 11:40 PM, Nick Coghlan wrote:
Raymond Hettinger wrote:
It is not the goal to replace locale or to accomodate every possible convention. The goal is to make a common task easier for many users. The current, default use of the period as a decimal point has not proven to be problem eventhough that convention is not universal. For a thousands separator, a comma is a decent choice that makes it easy follow-on with s.replace(',', '_') or somesuch.
In that case, I would simplify my suggestion to:
[[fill]align][sign][#][0][minimumwidth][,][.precision][type]
Addition to mini language documentation: The ',' option indicates that commas should be included in the output as a thousands separator. As with locales which do not use a period as the decimal point, locales which use a different convention for digit separation will need to use the locale module to obtain appropriate formatting.
This proposal has the advantage that you're not overly specifying the behavior in the format string itself.
That is: the "," option is really just indicating "please insert separators". With the current locale-ignorant implementation, that'd just mean "a comma every 3 digits". But it leaves the door open for a locale-sensitive variant of the format to be added in the future without conflicting with the instructions in the format string. (as the ability to specify an arbitrary character, or the ability to specify a comma instead of a period for the decimal point would).
I'm not against Raymond's proposal, just against doing a *bad* job of making it work in multiple locales. Locale conventions can be complex, and are going to be best represented outside the format string.
How about having a country code field, e.g. en-us would format according to US locale, in to India, ch to China, etc... that way the format string would become very simple (although the lib maintainer would need to know customs from all over the world). Then have a special country code that is a placeholder for whatever the locale the machine is set to.
[Lie Ryan]
How about having a country code field, e.g. en-us would format according to US locale, in to India, ch to China, etc... that way the format string would become very simple (although the lib maintainer would need to know customs from all over the world). Then have a special country code that is a placeholder for whatever the locale the machine is set to.
Am moving the discussion to the python-ideas list (at Guido's request). My proposal is strictly limited to the builtin, non-locale dependent formatting. Improvements to the locale module are probably as subject for another day. Raymond
-On [20090312 06:50], Lie Ryan (lie.1296@gmail.com) wrote:
How about having a country code field, e.g. en-us would format according to US locale, in to India, ch to China, etc... that way the format string would become very simple (although the lib maintainer would need to know customs from all over the world). Then have a special country code that is a placeholder for whatever the locale the machine is set to.
Then you are effectively duplicating what is already available via CLDR [1] and Babel [2]. [1] http://www.unicode.org/cldr/ [2] http://babel.edgewall.org/ -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Any road leads to the end of the world...
James Y Knight <foom@fuhm.net> writes:
You might be interested to know that in India, the commas don't come every 3 digits. In india, they come every two digits, after the first three. Thus one billion = 1,00,00,00,000. How are you gonna represent *that* in a formatting mini-language? :)
Likewise, China uses four-digit groupings (per “myriad”) <URL:http://en.wikipedia.org/wiki/Chinese_numerals#Reading_and_transcribing_numbe...>. -- \ “Self-respect: The secure feeling that no one, as yet, is | `\ suspicious.” —Henry L. Mencken | _o__) | Ben Finney
James Y Knight wrote:
You might be interested to know that in India, the commas don't come every 3 digits. In india, they come every two digits, after the first three. Thus one billion = 1,00,00,00,000. How are you gonna represent *that* in a formatting mini-language? :)
We outsource it. Send the number by email to a service centre in India, where an employee formats it for us and sends it back. -- Greg
Nick Coghlan wrote:
[[fill]align][sign][#][0][minimumwidth][,sep][.precision][type]
'sep' is the new field that defines the thousands separator.
Wouldn't it be better to use a locale setting for this, instead of having to specify it in every format string? If an app is using a particular thousands separator in one place, it will probably want to use it everywhere. -- Greg
Greg Ewing writes:
Nick Coghlan wrote:
[[fill]align][sign][#][0][minimumwidth][,sep][.precision][type]
'sep' is the new field that defines the thousands separator.
Wouldn't it be better to use a locale setting for this, instead of having to specify it in every format string?
Maybe, but the POSIX locale concept is broken (because it's process- global) in many contexts. Viz.
If an app is using a particular thousands separator in one place, it will probably want to use it everywhere.
Not if that app is internationalized (eg, a webapp that serves both Americans and Chinese).
Stephen J. Turnbull wrote:
Greg Ewing writes:
If an app is using a particular thousands separator in one place, it will probably want to use it everywhere.
Not if that app is internationalized (eg, a webapp that serves both Americans and Chinese).
I don't think you'll want to code the separators into all your format strings in that case, either. You'll want some sort of context that you set up for the page you're about to serve. -- Greg
Greg Ewing writes:
I don't think you'll want to code the separators into all your format strings in that case, either. You'll want some sort of context that you set up for the page you're about to serve.
Sure. But the POSIX locale is not a good solution, nor is it a building block for such a solution. If this PEP *can't* provide such a building block, too bad, life is like that. If it can, but we don't bother to look for it because of locale, for shame!
participants (10)
-
Antoine Pitrou -
Ben Finney -
Greg Ewing -
Guido van Rossum -
James Y Knight -
Jeroen Ruigrok van der Werven -
Lie Ryan -
Nick Coghlan -
Raymond Hettinger -
Stephen J. Turnbull