Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

Motivation: Provide a simple, non-locale aware way to format a number with a thousands separator. Adding thousands separators is one of the simplest ways to improve the professional appearance and readability of output exposed to end users. In the finance world, output with commas is the norm. Finance users and non-professional programmers find the locale approach to be frustrating, arcane and non-obvious. It is not the goal to replace locale or to accommodate every possible convention. The goal is to make a common task easier for many users. Research so far: Scanning the web, I've found that thousands separators are usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The COMMA is used when a PERIOD is the decimal separator. James Knight observed that Indian/Pakistani numbering systems group by hundreds. Ben Finney noted that Chinese group by ten-thousands. Visual Basic and its brethren (like MS Excel) use a completely different style and have ultra-flexible custom format specifiers like: "_($* #,##0_)". Proposal I (from Nick Coghlan]: A comma will be added to the format() specifier mini-language: [[fill]align][sign][#][0][minimumwidth][,][.precision][type] The ',' option indicates that commas should be included in the output as a thousands separator. As with locales which do not use a period as the decimal point, locales which use a different convention for digit separation will need to use the locale module to obtain appropriate formatting. The proposal works well with floats, ints, and decimals. It also allows easy substitution for other separators. For example: format(n, "6,f").replace(",", "_") This technique is completely general but it is awkward in the one case where the commas and periods need to be swapped. format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".") Proposal II (to meet Antoine Pitrou's request): Make both the thousands separator and decimal separator user specifiable but not locale aware. For simplicity, limit the choices to a comma, period, space, or underscore.. [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type] Examples: format(1234, "8.1f") --> ' 1234.0' format(1234, "8,1f") --> ' 1234,0' format(1234, "8T.,1f") --> ' 1.234,0' format(1234, "8T .f") --> ' 1 234,0' format(1234, "8d") --> ' 1234' format(1234, "8T,d") --> ' 1,234' This proposal meets mosts needs (except for people wanting grouping for hundreds or ten-thousands), but iIt comes at the expense of being a little more complicated to learn and remember. Also, it makes it more challenging to write custom __format__ methods that follow the format specification mini-language. For the locale module, just the "T" is necessary in a formatting string since the tool already has procedures for figuring out the actual separators from the local context. Comments and suggestions are welcome but I draw the line at Mayan numbering conventions ;-) Raymond

Thanks for doing this, Raymond. I don't have any comments on the specific proposals, yet. I'm still thinking it over. But here are a few comments. Raymond Hettinger wrote:
Motivation:
You might want to mention the existing 'n' format type. I don't think it's widely known. It handles the odd cases of locales that have odd groupings, such as James Knight's example from India (1,00,00,00,000). James: If you know the locale name for that, I'd like to know it. It would be handy for testing. floats are not terribly useful for 'n', however:
format(1000000, 'n') '1,000,000' format(1000000.111111, 'n') '1e+06' format(100000.111111, 'n') '100,000'
Proposal I (from Nick Coghlan]: A comma will be added to the format() specifier mini-language:
[[fill]align][sign][#][0][minimumwidth][,][.precision][type]
Could you add the existing PEP-3101 specifier, just so we know what we're changing (and so that I don't have to look it up constantly!)? [[fill]align][sign][#][0][width][.precision][type] (As an aside, I copied this from http://docs.python.org/library/string.html#formatstrings, I just noticed that PEP 3101 differs in the name of the width/minwidth field.)
for hundreds or ten-thousands), but iIt comes at the expense of
Typo (iIt).
Also, it makes it more challenging to write custom __format__ methods that follow the format specification mini-language.
For this exact reason, I've always wanted to add a method somewhere that parses the mini-language. The code exists in the C implementation, it would just need to be exposed, probably returning a namedtuple with the various fields.
For the locale module, just the "T" is necessary in a formatting string since the tool already has procedures for figuring out the actual separators from the local context.
Is this needed at all? That is, having just the "T"? How is this different from using type=n? Having asked the question, I guess the answer is it lets you use it with the more useful float type=f.
Comments and suggestions are welcome but I draw the line at Mayan numbering conventions ;-)
That's only a problem until December 21, 2012 anyway! Eric.

James Knight observed that Indian/Pakistani numbering systems group by hundreds. I'm not 100% sure here, but I believe that in India, they insert a separator after the first 3 digits, then another after 2 more digits,
Raymond Hettinger wrote: then every 3 digits after that (not sure if they use commas or periods, I think commas): 1,000,000,00,000 -bruce

Make both the thousands separator and decimal separator user specifiable but not locale aware.
-1.0 as it stands (or -1,0 if you prefer) When you say 'user' you mean 'developer'. Having the developer choose the separators means it *won't* be what the user wants. Why would you stick in separators if not to display to a user? If I'm French then all decimal points should be ',' not '.' regardless of what language the developer speaks, right? A format specifier that says "please use the local-specific separators when formatting this number" would be fine. We already have 'n' for this but suppose we choose ';' as the character for this (chosen because it looks like a '.' or a ',' which is are two of the three most common choices). For example format(x, '6;d') == format(x, '6n') and you can use '';' with any number type: format(x, '6;.3f') or format(x, '10;g'). I'd be inclined to always group in units of four digits if someone writes format(x, '6;x'). --- Bruce

[Bruce Leban]
If I'm French then all decimal points should be ',' not '.' regardless of what language the developer speaks, right?
We already have a locale aware solution and that should be used for internationalized apps. The locale module is not going away. This proposal is for everyday programs for local consumption (most scripts never get internationalized). I would even venture that most Python scripts are not written by professional programmers. If an accountant needs to knock-out a quick report, he/she should have a simple means of basic formatting without invoking all of the locale machinery. Raymond

Raymond, aren't you equating "local" with the US? The local module lets you take the locale as a separate parameter. I agree we should not try to duplicate it (though it's a bad API since it relies on global state -- that doesn't work very well in multi-threaded or web apps). But it does make sense for an accountant in France or Holland to hardcode her desire for a decimal comma and thousand-separating periods, as otherwise their boss won't be able to interpret the output. On Thu, Mar 12, 2009 at 11:28 AM, Raymond Hettinger <python@rcn.com> wrote:
[Bruce Leban]
If I'm French then all decimal points should be ',' not '.' regardless of what language the developer speaks, right?
We already have a locale aware solution and that should be used for internationalized apps. The locale module is not going away.
This proposal is for everyday programs for local consumption (most scripts never get internationalized). I would even venture that most Python scripts are not written by professional programmers. If an accountant needs to knock-out a quick report, he/she should have a simple means of basic formatting without invoking all of the locale machinery.
Raymond _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

[GvR]
Raymond, aren't you equating "local" with the US?
Not at all. "For local consumption" meant anything that isn't distributed as a fully internationalized app. Right now, all our reprs and string interpolations are not locale-aware (i.e. float reprs are hardwired to use periods for the decimal separator). Those tools are pretty useful to us in day-to-day work. I'm just proposing to extend those non-locale-aware capabilities to include a thousands separator. For a fully internationalized app, I would use something like Babel which addresses the challenge in a comprehensive and uniform manner.
The local module lets you take the locale as a separate parameter. I agree we should not try to duplicate it (though it's a bad API since it relies on global state -- that doesn't work very well in multi-threaded or web apps).
But it does make sense for an accountant in France or Holland to hardcode her desire for a decimal comma and thousand-separating periods, as otherwise their boss won't be able to interpret the output.
Well said. Raymond

On 3/12/09, Raymond Hettinger <python@rcn.com> wrote:
If an accountant needs to knock-out a quick report, he/she should have a simple means of basic formatting without invoking all of the locale machinery.
Fair enough. But what does a thousands separator provide that the "n" type doesn't already provide? (Well, except that n isn't as well known -- but initially this won't be either.) Do you want to avoid using locale even in the background? Do you want to avoid having to set a locale in the program startup? Do you want a better default for locale? Do you really want a different type, such as "m" for money? (That sounds sensible to me, except that there are so many different standard ways to format money, even within the US, so I'm not sure a single format would do it.) -jJ

[Jim Jewett]
Fair enough. But what does a thousands separator provide that the "n" type doesn't already provide? (Well, except that n isn't as well known -- but initially this won't be either.)
It's nice to be have a non-locale aware alternative so you can say explicitly what you want. This is especially helpful in Guido's example where you need to format for a different locale than the one that is currently on your machine (i.e. the global state doesn't match the target). FWIW, C-Sharp provides both ways, a locale aware "n" format and a hard-wired explicit thousands separator. See the updated PEP for examples and a link.
Do you want to avoid using locale even in the background?
I thought locale was always there.
Do you want to avoid having to set a locale in the program startup?
Yes. I don't think most casaul users should have to figure that out. It's a little to magical and arcane: >>> import local >>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')
Do you want a better default for locale?
The default does suck: >>> format(1237, "n") '1237'
Do you really want a different type, such as "m" for money?
I don't but I'm sure someone does. I did write a money formatter sample recipe for the decimal docs so people would have something to model from. FWIW, I've always thought it weird that the currency symbol could shift with a locale setting. ISTM, that if you change the symbol, you also have to change the amount that goes with it :-) Raymond

Raymond Hettinger wrote:
[Jim Jewett]
Fair enough. But what does a thousands separator provide that the "n" type doesn't already provide? (Well, except that n isn't as well known -- but initially this won't be either.)
It's nice to be have a non-locale aware alternative so you can say explicitly what you want. This is especially helpful in Guido's example where you need to format for a different locale than the one that is currently on your machine (i.e. the global state doesn't match the target).
I've always thought that we should have a utility function which formats a number based on the same settings that are in the locale, but not actually use the locale. Something like: format_number(123456787654321.123, decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2])
'12 34 56 78 765 4321,123'
That would get rid of threading issues, and you wouldn't have to worry about what locales were installed. I basically have this function in the various formatting routines, it just needs to be pulled out and exposed.
Do you really want a different type, such as "m" for money?
I don't but I'm sure someone does. I did write a money formatter sample recipe for the decimal docs so people would have something to model from.
This becomes easier with the hypothetical "format_number" routine. But this is all orthogonal to the str.format() discussion. Eric.

Eric Smith wrote:
I've always thought that we should have a utility function which formats a number based on the same settings that are in the locale, but not actually use the locale. Something like:
format_number(123456787654321.123, decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2])
'12 34 56 78 765 4321,123'
To be maximally useful (for example, so it could be used in Decimal to implement locale formatting), maybe it should work on strings:
format_number(whole_part='123456787654321', fractional_part='123', decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2]) '12 34 56 78 765 4321,123'
format_number(whole_part='123456787654321', decimal_point=',', thousands_sep='.', grouping=[4, 3, 2]) '12.34.56.78.765.4321'
I think such a method, along with locale.localeconv(), would be the workhorse for much of formatting we've been talking about. It could be flushed out with the sign and other remaining fields from localeconv(). The key point is that it takes everything as parameters and doesn't use any global state. In particular, it by itself would not reference the locale. I'll probably add such a routine anyway, even if it doesn't get documented as a public API. Eric.

Today's updates to http://www.python.org/dev/peps/pep-0378/ * Specify what width means when thousands separators are present. * Clarify that the locale module is not being proposed to change. * Add research on what is done in C-Sharp, MS-Excel, COBOL, and CommonLisp. * Add more examples. Raymond

Eric Smith wrote:
format_number(whole_part='123456787654321', decimal_point=',', thousands_sep='.', grouping=[4, 3, 2]) '12.34.56.78.765.4321'
Maybe the 'thousands_sep' parameter should be called 'grouping_sep' (since it doesn't always group by thousands)? -bruce frederiksen

Bruce Frederiksen wrote:
Eric Smith wrote:
format_number(whole_part='123456787654321', decimal_point=',', thousands_sep='.', grouping=[4, 3, 2]) '12.34.56.78.765.4321'
Maybe the 'thousands_sep' parameter should be called 'grouping_sep' (since it doesn't always group by thousands)?
-bruce frederiksen
thousands_sep is the locale.localeconv() name, which I suggest we use. I suggest that this particular API only support the LC_NUMERIC fields (decimal_point, grouping, thousands_sep), and that maybe we have a separate format_money which supports the LC_MONETARY fields. Eric.

I've always thought that we should have a utility function which formats a number based on the same settings that are in the locale, but not actually use the locale. Something like:
format_number(123456787654321.123, decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2])
'12 34 56 78 765 4321,123'
To be maximally useful (for example, so it could be used in Decimal to implement locale formatting), maybe it should work on strings:
format_number(whole_part='123456787654321', fractional_part='123', decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2]) '12 34 56 78 765 4321,123'
Whoa guys! I think you're treading very far away from and rejecting the whole idea of PEP 3101 which was to be the one ring to bind them all with format(obj, fmt) having just two arguments and doing nothing but passing them on to obj.__fmt__() which would be responsible for parsing a format string. Also,even if you wanted a flexible clear separate tool just for number formatting, I don't think keyword arguments are the way to go. That is a somewhat heavy approach with limited flexibility. The research in PEP 378 shows that for languages needing fine control and extreme versatility in formatting, some kind of picture string is the way to go. MS Excel is a champ at number/date formatting strings: #,##0 and whatnot. The allow negatives to have placeholders, trailing minus signs, parentheses, etc. Columns can be aligned neating, any type of padding can be used, any type of separator may be specified. The COBOL picture statements also offer flexibility and clarity. Mini-languages of some sort beat the heck out of functions with a zillion optional arguments. Raymond "Working with creative thinkers can be like herding cats."

Raymond Hettinger wrote:
Whoa guys! I think you're treading very far away from and rejecting the whole idea of PEP 3101 which was to be the one ring to bind them all with format(obj, fmt) having just two arguments and doing nothing but passing them on to obj.__fmt__() which would be responsible for parsing a format string.
I completely agree. That's why I said "But this is all orthogonal to the str.format() discussion." I meant "orthogonal" in the "unrelated" sense. I'm completely on board with your PEP 378 as a simple way just to get some simple formatting into numbers.
Also,even if you wanted a flexible clear separate tool just for number formatting, I don't think keyword arguments are the way to go. That is a somewhat heavy approach with limited flexibility. The research in PEP 378 shows that for languages needing fine control and extreme versatility in formatting, some kind of picture string is the way to go. MS Excel is a champ at number/date formatting strings: #,##0 and whatnot. The allow negatives to have placeholders, trailing minus signs, parentheses, etc. Columns can be aligned neating, any type of padding can be used, any type of separator may be specified. The COBOL picture statements also offer flexibility and clarity. Mini-languages of some sort beat the heck out of functions with a zillion optional arguments.
I think picture based is okay and has its place, but a routine like my proposed format_number (which I know is a bad name) is really the heavy lifter for all locale-based number formatting. Decimal shouldn't really have to completely reimplement locale-based formatting, especially when it already exists in the core. I just want to expose it. Eric.

[Eric Smith]
Decimal shouldn't really have to completely reimplement locale-based formatting, especially when it already exists in the core. I just want to expose it.
I see. Sounds like you're looking for the parser to have some hooks so that people writing new __format__ methods don't have to start from scratch.
I completely agree. That's why I said "But this is all orthogonal to the str.format() discussion." I meant "orthogonal" in the "unrelated" sense.
Makes sense. Hopefully, we can get this thread back on track for evaluating the proposal for a minor buildout to the existing mini-language. Raymond

Raymond Hettinger wrote:
[Eric Smith]
Decimal shouldn't really have to completely reimplement locale-based formatting, especially when it already exists in the core. I just want to expose it.
I see. Sounds like you're looking for the parser to have some hooks so that people writing new __format__ methods don't have to start from scratch.
Not necessarily hooks, but some support routines. I think the standard format specifier parser should be exposed, and also the locale-based formatter should be exposed. These are both unrelated to PEP 378, but they could be used to implement it. They'd be especially useful for non-builtin types like Decimal.
Makes sense. Hopefully, we can get this thread back on track for evaluating the proposal for a minor buildout to the existing mini-language.
Right. Apologies for hijacking it, and especially for not making it clear that I was veering off subject. Eric.

[Eric Smith]
Right. Apologies for hijacking it, and especially for not making it clear that I was veering off subject.
No problem. It was an interesting side discussion. I've updated the PEP to include your variant that doesn't use T. The examples show that it is much cleaner looking and self-evident. Raymond

On 3/12/09, Eric Smith <eric@trueblade.com> wrote:
Eric Smith wrote:
... formats a number based on the same settings that are in the locale, but not actually use the locale. ...
The key point is that it takes everything as parameters and doesn't use any global state. In particular, it by itself would not reference the locale.
Why not? You'll need *some* default for decimal_point, and the one from localeconv makes at least as much sense as a hard-coded default. I agree that it shouldn't *change* anything in the locale, and any keywords explicitly passed in should override locale, but if it never looks at locale, you'll get patterns like import locale kw=dict(locale.localeconv) kw['thousands_sep']=' ' new_util_func(number, **kw) -jJ

Le Thu, 12 Mar 2009 23:50:17 -0400, Jim Jewett <jimjjewett@gmail.com> s'exprima ainsi:
Why not? You'll need *some* default for decimal_point, and the one from localeconv makes at least as much sense as a hard-coded default.
I agree that it shouldn't *change* anything in the locale, and any keywords explicitly passed in should override locale, but if it never looks at locale, you'll get patterns like
I think this makes much sense. Actually, there may be a principle similar to 'cascade overriding' in CSS sheets: the last one who speaks wins. In the case of number formatting, this could be eg a cascade of: locale format --> coded format --> end-user config format denis ------ la vita e estrany

Le Thu, 12 Mar 2009 19:31:39 -0400, Eric Smith <eric@trueblade.com> s'exprima ainsi:
I've always thought that we should have a utility function which formats a number based on the same settings that are in the locale, but not actually use the locale. Something like:
format_number(123456787654321.123, decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2])
'12 34 56 78 765 4321,123'
To be maximally useful (for example, so it could be used in Decimal to implement locale formatting), maybe it should work on strings:
format_number(whole_part='123456787654321', fractional_part='123', decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2]) '12 34 56 78 765 4321,123'
format_number(whole_part='123456787654321', decimal_point=',', thousands_sep='.', grouping=[4, 3, 2]) '12.34.56.78.765.4321'
I find the overall problem of providing an interface to specify a number format rather challenging. The issue I see is to design a formatting pattern that is simple, clear, _and_ practicle. A practicle pattern is easy to specify, but then it becomes rather illegible and/or hard to remember, while a legible one ends up excessively verbose. I have the impression, but I may well be wrong, that contrarily to a format, a *formatted number* instead seems easy to scan -- with human eyes. So, as a crazy idea, I wonder whether we shouldn't let the user provide a example formatted number instead. This may address most of use cases, but probably not all. To makes things easier, why not specify a canonical number, such as '-123456.789', of which the user should define the formatted version? Then a smart parser could deduce the format to be applied to further numbers. Below a purely artificial example. -123456.789 --> kg 00_123_456,79- format: unit: 'kg' unit_pos: LEFT unit_sep: ' ' thousand_sep: '_' fract_sep : ',' sign_pos: RIGHT sign_sep: None padding_char: '0' There are obvious issues: * Does rouding apply to whole precision (number of significative digits), or to the fractional part only? Then, should the format be interpreted as the most common case (probably fract. rounding), provide a disambiguation flag, provide a flag for non-default case only? What if rounding applies after a big number of digits? Should we instead allow the user providing a longer number? * Similar for padding: does it apply to the length of the whole number or to the integral part (common in financial apps to align decimal signs). What if the padding applies to a smaller number of digits than the one of the canonical number. Should we instead allow the user providing a shorter number? * probably more... The space of valid formats can be specified using a parsing grammar, so that a parse failure indicates invalid format, and a "tagged" parse tree provides all the information needed to construct a format object. Really do not know whether this idea is stupid or worth beeing explored ;-) [But I would well try it for personal use. At least as everyday-fast-and-easy feature.] Denis ------ la vita e estrany

spir wrote:
Le Thu, 12 Mar 2009 19:31:39 -0400, Eric Smith <eric@trueblade.com> s'exprima ainsi:
I've always thought that we should have a utility function which formats a number based on the same settings that are in the locale, but not actually use the locale. Something like:
format_number(123456787654321.123, decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2])
'12 34 56 78 765 4321,123' To be maximally useful (for example, so it could be used in Decimal to implement locale formatting), maybe it should work on strings:
format_number(whole_part='123456787654321', fractional_part='123', decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2]) '12 34 56 78 765 4321,123'
format_number(whole_part='123456787654321', decimal_point=',', thousands_sep='.', grouping=[4, 3, 2]) '12.34.56.78.765.4321'
I find the overall problem of providing an interface to specify a number format rather challenging. The issue I see is to design a formatting pattern that is simple, clear, _and_ practicle. A practicle pattern is easy to specify, but then it becomes rather illegible and/or hard to remember, while a legible one ends up excessively verbose.
I have the impression, but I may well be wrong, that contrarily to a format, a *formatted number* instead seems easy to scan -- with human eyes. So, as a crazy idea, I wonder whether we shouldn't let the user provide a example formatted number instead. This may address most of use cases, but probably not all.
To makes things easier, why not specify a canonical number, such as '-123456.789', of which the user should define the formatted version? Then a smart parser could deduce the format to be applied to further numbers. Below a purely artificial example.
-123456.789 --> kg 00_123_456,79-
format: unit: 'kg' unit_pos: LEFT unit_sep: ' ' thousand_sep: '_' fract_sep : ',' sign_pos: RIGHT sign_sep: None padding_char: '0'
There are obvious issues: * Does rouding apply to whole precision (number of significative digits), or to the fractional part only? Then, should the format be interpreted as the most common case (probably fract. rounding), provide a disambiguation flag, provide a flag for non-default case only? What if rounding applies after a big number of digits? Should we instead allow the user providing a longer number? * Similar for padding: does it apply to the length of the whole number or to the integral part (common in financial apps to align decimal signs). What if the padding applies to a smaller number of digits than the one of the canonical number. Should we instead allow the user providing a shorter number? * probably more...
The space of valid formats can be specified using a parsing grammar, so that a parse failure indicates invalid format, and a "tagged" parse tree provides all the information needed to construct a format object.
Really do not know whether this idea is stupid or worth beeing explored ;-) [But I would well try it for personal use. At least as everyday-fast-and-easy feature.]
Your proposal (other than being harder to implement), is similar to the way Excel handled formatting, but instead of sample number, they uses # for placeholder. If you really want to test-implement it, better try using that. And I think it is impossible for the parser to be that smart to recognize that sign pos should be put in the rear (the smartest parser might only treat it as literal negative). Also it is highly inflexible, what about custom positive sign? What if I want to use literal -? What about literal number? What about non-latin number?

Le Fri, 13 Mar 2009 22:46:32 +1100, Lie Ryan <lie.1296@gmail.com> s'exprima ainsi:
spir wrote:
Le Thu, 12 Mar 2009 19:31:39 -0400, Eric Smith <eric@trueblade.com> s'exprima ainsi:
I've always thought that we should have a utility function which formats a number based on the same settings that are in the locale, but not actually use the locale. Something like:
format_number(123456787654321.123, decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2])
> '12 34 56 78 765 4321,123' To be maximally useful (for example, so it could be used in Decimal to implement locale formatting), maybe it should work on strings:
format_number(whole_part='123456787654321', fractional_part='123', decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2]) '12 34 56 78 765 4321,123'
format_number(whole_part='123456787654321', decimal_point=',', thousands_sep='.', grouping=[4, 3, 2]) '12.34.56.78.765.4321'
I find the overall problem of providing an interface to specify a number format rather challenging. The issue I see is to design a formatting pattern that is simple, clear, _and_ practicle. A practicle pattern is easy to specify, but then it becomes rather illegible and/or hard to remember, while a legible one ends up excessively verbose.
I have the impression, but I may well be wrong, that contrarily to a format, a *formatted number* instead seems easy to scan -- with human eyes. So, as a crazy idea, I wonder whether we shouldn't let the user provide a example formatted number instead. This may address most of use cases, but probably not all.
To makes things easier, why not specify a canonical number, such as '-123456.789', of which the user should define the formatted version? Then a smart parser could deduce the format to be applied to further numbers. Below a purely artificial example.
-123456.789 --> kg 00_123_456,79-
format: unit: 'kg' unit_pos: LEFT unit_sep: ' ' thousand_sep: '_' fract_sep : ',' sign_pos: RIGHT sign_sep: None padding_char: '0'
There are obvious issues: * Does rouding apply to whole precision (number of significative digits), or to the fractional part only? Then, should the format be interpreted as the most common case (probably fract. rounding), provide a disambiguation flag, provide a flag for non-default case only? What if rounding applies after a big number of digits? Should we instead allow the user providing a longer number? * Similar for padding: does it apply to the length of the whole number or to the integral part (common in financial apps to align decimal signs). What if the padding applies to a smaller number of digits than the one of the canonical number. Should we instead allow the user providing a shorter number? * probably more...
The space of valid formats can be specified using a parsing grammar, so that a parse failure indicates invalid format, and a "tagged" parse tree provides all the information needed to construct a format object.
Really do not know whether this idea is stupid or worth beeing explored ;-) [But I would well try it for personal use. At least as everyday-fast-and-easy feature.]
Your proposal (other than being harder to implement), is similar to the way Excel handled formatting, but instead of sample number, they uses # for placeholder. If you really want to test-implement it, better try using that.
Right. I also think now that "picture strings" pointed in the PEP are a better option for such needs. While they probably cannot handle issues such as ambiguity of precision or padding without additional parameters, neither. The only advantage of my proposal is that the user provides an example, instead of an abstract representation.
And I think it is impossible for the parser to be that smart to recognize that sign pos should be put in the rear (the smartest parser might only treat it as literal negative).
? Either I do not understand, or it is wrong. You can well have a parse expression allowing either a front or a rear sign, as long as there is a non-ambiguous sign-pattern. What does 'literal negative' mean?
Also it is highly inflexible, what about custom positive sign? What if I want to use literal -? What about literal number? What about non-latin number?
~ true. But this applies to any formatting rule, no? You have to specify eg which code point areas are allowed for valid digits -- and that must not overlap with code points allowed as sign, separators, or whatever. Custom signs are not a problem, as long as they do not conflict with digits or seps. Idem for non-latin. These points are not specific to my proposal, they apply to any kind of formatting instead.
What if I want to use literal -? What about literal number?
I do not understand your point. Denis ------ la vita e estrany

spir wrote:
Le Fri, 13 Mar 2009 22:46:32 +1100, Lie Ryan <lie.1296@gmail.com> s'exprima ainsi:
spir wrote:
Le Thu, 12 Mar 2009 19:31:39 -0400, Eric Smith <eric@trueblade.com> s'exprima ainsi:
I've always thought that we should have a utility function which formats a number based on the same settings that are in the locale, but not actually use the locale. Something like:
format_number(123456787654321.123, decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2])
>> '12 34 56 78 765 4321,123' To be maximally useful (for example, so it could be used in Decimal to implement locale formatting), maybe it should work on strings:
> format_number(whole_part='123456787654321', fractional_part='123', decimal_point=',', thousands_sep=' ', grouping=[4, 3, 2]) > '12 34 56 78 765 4321,123'
> format_number(whole_part='123456787654321', decimal_point=',', thousands_sep='.', grouping=[4, 3, 2]) > '12.34.56.78.765.4321'
I find the overall problem of providing an interface to specify a number format rather challenging. The issue I see is to design a formatting pattern that is simple, clear, _and_ practicle. A practicle pattern is easy to specify, but then it becomes rather illegible and/or hard to remember, while a legible one ends up excessively verbose.
I have the impression, but I may well be wrong, that contrarily to a format, a *formatted number* instead seems easy to scan -- with human eyes. So, as a crazy idea, I wonder whether we shouldn't let the user provide a example formatted number instead. This may address most of use cases, but probably not all.
To makes things easier, why not specify a canonical number, such as '-123456.789', of which the user should define the formatted version? Then a smart parser could deduce the format to be applied to further numbers. Below a purely artificial example.
-123456.789 --> kg 00_123_456,79-
format: unit: 'kg' unit_pos: LEFT unit_sep: ' ' thousand_sep: '_' fract_sep : ',' sign_pos: RIGHT sign_sep: None padding_char: '0'
There are obvious issues: * Does rouding apply to whole precision (number of significative digits), or to the fractional part only? Then, should the format be interpreted as the most common case (probably fract. rounding), provide a disambiguation flag, provide a flag for non-default case only? What if rounding applies after a big number of digits? Should we instead allow the user providing a longer number? * Similar for padding: does it apply to the length of the whole number or to the integral part (common in financial apps to align decimal signs). What if the padding applies to a smaller number of digits than the one of the canonical number. Should we instead allow the user providing a shorter number? * probably more...
The space of valid formats can be specified using a parsing grammar, so that a parse failure indicates invalid format, and a "tagged" parse tree provides all the information needed to construct a format object.
Really do not know whether this idea is stupid or worth beeing explored ;-) [But I would well try it for personal use. At least as everyday-fast-and-easy feature.] Your proposal (other than being harder to implement), is similar to the way Excel handled formatting, but instead of sample number, they uses # for placeholder. If you really want to test-implement it, better try using that.
Right. I also think now that "picture strings" pointed in the PEP are a better option for such needs. While they probably cannot handle issues such as ambiguity of precision or padding without additional parameters, neither. The only advantage of my proposal is that the user provides an example, instead of an abstract representation.
And I think it is impossible for the parser to be that smart to recognize that sign pos should be put in the rear (the smartest parser might only treat it as literal negative).
? Either I do not understand, or it is wrong.
Partially wrong, when I said "literal negative" I really meant "literal -".
You can well have a parse expression allowing either a front or a rear sign, as long as there is a non-ambiguous sign-pattern. What does 'literal negative' mean?
But what if I want ~ to denote negative number?
Also it is highly inflexible, what about custom positive sign? What if I want to use literal -? What about literal number? What about non-latin number?
~ true. But this applies to any formatting rule, no?
Yes, but using number example introduces lots of ambiguities. You must use parameters to avoid these ambiguities.
You have to specify eg which code point areas are allowed for valid digits -- and that must not overlap with code points allowed as sign, separators, or whatever.
Custom signs are not a problem, as long as they do not conflict with digits or seps. Idem for non-latin. These points are not specific to my proposal, they apply to any kind of formatting instead.
How would the example format interpret this: 123 456~ When I want ~ to be the negative sign? What if I want < for negative and > for positive? Those are quite hyphotetical, but if we're talking about languages that doesn't use latin numeral, that sort of thing is very likely to happen.
What if I want to use literal -? What about literal number?
I do not understand your point.
What if I want to I want my number to look like this: 123-4567 Using example format would have a hard time to guess whether the "-" should be a negative sign or literal "-". Maybe you can use escape characters, but that would turn the strongest point of example format to itself

Todays updates to: http://www.python.org/dev/peps/pep-0378/ * Summarize commentary to date. * Add APOSTROPHE and non-breaking SPACE to the list of separators. * Add more links to external references. * Detail issues with the locale module. * Clarify how proposal II is parsed.

Raymond Hettinger wrote:
Todays updates to: http://www.python.org/dev/peps/pep-0378/
* Summarize commentary to date. * Add APOSTROPHE and non-breaking SPACE to the list of separators. * Add more links to external references. * Detail issues with the locale module. * Clarify how proposal II is parsed.
+1 for proposal 2 Comment on locale. It was designed, perhaps 30 years ago, for *national* programming (hence the global locale setting). The doc should really describe it as for 'nationalization' rather than for 'internatioalization'. For *global* (international) programming, all the formatting functions should either take a locale dict or be instance methods of a Locale class whose instances are individual locales. With this PEP implemented, we could potentially locale with a platform- and implementation-language-independent countrybase and country module with Country class using the expanded str.format strings. The only thing not directly handled, as far as I can see, is groupings other than by threes, which would have to be handled by other means. Terry Jan Reedy

Raymond Hettinger wrote:
Todays updates to: http://www.python.org/dev/peps/pep-0378/
* Summarize commentary to date. * Add APOSTROPHE and non-breaking SPACE to the list of separators. * Add more links to external references. * Detail issues with the locale module. * Clarify how proposal II is parsed. Still doesn't specify to digits beyond the decimal point. I don't really care what the choice is, but I do care that the choice is specified. Is the precision in digits, or is it width of the post- decimal point field? If the latter, does a precision of 4 end with a comma or not?
In particular, what should (format(9876.54321, "13,.5f"), format(9876.54321, "12,.4f")) produce? Possible "reasonable" answers: A ' 9,876.54321', ' 9,876.5432' B ' 9,876.543,21', ' 9,876.543,2' C ' 9,876.543,2', ' 9,876.543,' D ' 9,876.543,2', ' 9,876.543' I prefer B, but I can see an argument for any of the four above. --Scott David Daniels Scott.Daniels@Acm.Org

Scott David Daniels wrote:
Still doesn't specify to digits beyond the decimal point. I don't really care what the choice is, but I do care that the choice is specified. Is the precision in digits, or is it width of the post- decimal point field? If the latter, does a precision of 4 end with a comma or not?
In particular, what should (format(9876.54321, "13,.5f"), format(9876.54321, "12,.4f")) produce? Possible "reasonable" answers: A ' 9,876.54321', ' 9,876.5432' B ' 9,876.543,21', ' 9,876.543,2' C ' 9,876.543,2', ' 9,876.543,' D ' 9,876.543,2', ' 9,876.543' I prefer B, but I can see an argument for any of the four above.
The C locale functions don't support grouping to the right of the decimal. I don't think I've ever seen a system that supports it. Do you have any examples? I'd say A. Eric.

Scott David Daniels wrote:
Still doesn't specify [how to deal with] digits beyond the decimal point.... what should (format(9876.54321, "13,.5f"), format(9876.54321, "12,.4f")) produce? A ' 9,876.54321', ' 9,876.5432' B ' 9,876.543,21', ' 9,876.543,2' C ' 9,876.543,2', ' 9,876.543,' D ' 9,876.543,2', ' 9,876.543' I prefer B, but I can see an argument for any of the four above.
The C locale functions don't support grouping to the right of the decimal. I don't think I've ever seen a system that supports it. Do you have any examples? I've only used separators to check digits below the decimal point. Most high-precision tables of constants that I've seen use 5-digit grouping (e.g. wikipedia for pi): 3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 But 3 on the left and 5 on the right really seems to be too much. I'd say A. For me, A and B are the "preferable" solutions; I just think
Eric Smith wrote: the PEP needs to say what it chooses. --Scott David Daniels Scott.Daniels@Acm.Org

Scott David Daniels wrote:
Raymond Hettinger wrote:
Todays updates to: http://www.python.org/dev/peps/pep-0378/
* Summarize commentary to date. * Add APOSTROPHE and non-breaking SPACE to the list of separators. * Add more links to external references. * Detail issues with the locale module. * Clarify how proposal II is parsed. Still doesn't specify to digits beyond the decimal point. I don't really care what the choice is, but I do care that the choice is specified. Is the precision in digits, or is it width of the post- decimal point field? If the latter, does a precision of 4 end with a comma or not?
In particular, what should (format(9876.54321, "13,.5f"), format(9876.54321, "12,.4f")) produce? Possible "reasonable" answers: A ' 9,876.54321', ' 9,876.5432' B ' 9,876.543,21', ' 9,876.543,2' C ' 9,876.543,2', ' 9,876.543,' D ' 9,876.543,2', ' 9,876.543' I prefer B, but I can see an argument for any of the four above.
--Scott David Daniels Scott.Daniels@Acm.Org
Has anyone mentioned yet that in german you write the following? 10.000.000,000.001 (In german , and . are swapped.) Is this aspect taken into account? How is i18n/l10n managed? -panzi

[Mathias Panzenböck]
Has anyone mentioned yet that in german you write the following? 10.000.000,000.001
These are all red-herrings. The proposal is not about internationalization and it says as much. There is no doubt that everyone and his brother can think up a different convention for writing down numbers. The PEP proposes a non-localized way to specify one of several separators to group thousands to the left of the decimal point. At least one way (spaces or underscores) should be readable, understandable, and useful to folks from many diverse backgrounds. It is not the intention to be able to reproduce everything that a person can think up. That would be a fools errand. Raymond

Scott David Daniels wrote:
B ' 9,876.543,21', ' 9,876.543,2' C ' 9,876.543,2', ' 9,876.543,' D ' 9,876.543,2', ' 9,876.543'
What??? On the planet I come from, nobody uses separators for digits *after* the decimal point, unless perhaps if they're spaces. Certainly never commas. -- Greg

Raymond Hettinger wrote:
* Summarize commentary to date. * Add APOSTROPHE and non-breaking SPACE to the list of separators. * Add more links to external references. * Detail issues with the locale module. * Clarify how proposal II is parsed.
[Scott David Daniels]
Still doesn't specify to digits beyond the decimal point.
Will clarify that the intent is to put thousands separators only to the left of the decimal point.
In particular, what should (format(9876.54321, "13,.5f"), format(9876.54321, "12,.4f")) produce? Possible "reasonable" answers: A ' 9,876.54321', ' 9,876.5432' B ' 9,876.543,21', ' 9,876.543,2' C ' 9,876.543,2', ' 9,876.543,' D ' 9,876.543,2', ' 9,876.543' I prefer B, but I can see an argument for any of the four above.
Am proposing A That matches the existing precedent in the local module:
locale.setlocale(locale.LC_ALL, 'English_United States.1252') 'English_United States.1252' locale.format("%15.8f", pi*1000, grouping=True) ' 3,141.59265359'
It also matches what my adding have machines done, what my HP calculator does, how excel handles thousands grouping, and the other examples cited in the PEP. Am thinking that anything else this would be a new, made-up requirement. The closest I've seen to this is grouping of digits in long sequences of pi and in logarithm tables. It may be useful to someone somewhere, but am not going to propose it for the PEP. Raymond Raymond

spir wrote:
I have the impression, but I may well be wrong, that contrarily to a format, a *formatted number* instead seems easy to scan -- with human eyes. So, as a crazy idea, I wonder whether we shouldn't let the user provide a example formatted number instead. This may address most of use cases, but probably not all.
To makes things easier, why not specify a canonical number, such as '-123456.789', of which the user should define the formatted version? Then a smart parser could deduce the format to be applied to further numbers. Below a purely artificial example.
-123456.789 --> kg 00_123_456,79-
format: unit: 'kg' unit_pos: LEFT unit_sep: ' ' thousand_sep: '_' fract_sep : ',' sign_pos: RIGHT sign_sep: None padding_char: '0'
Once the .format language is expanded to be able to define grouping separators, one will be able to define functions to turn such templates in field specs. Now many options are allowed would depend on the function.

Jim Jewett <jimjjewett@...> writes:
Do you want to avoid using locale even in the background? Do you want to avoid having to set a locale in the program startup? Do you want a better default for locale?
As Guido said, a problem is that locale relies on shared state. It makes it very painful to use (any module setting the locale to a value which suits its semantics can negatively impact other modules or libraries in your application). But even worse is that the desired locale is not necessarily installed. For example if I develop an app for French users but it is hosted on an US server, perhaps the 'fr_FR' locale won't be available at all.

Bruce Leban wrote:
When you say 'user' you mean 'developer'. Having the developer choose the separators means it *won't* be what the user wants. Why would you stick in separators if not to display to a user?
I agree. I don't see a use case for hard-coding non-standard separators into every format string. So I'm +1 on proposal I and -1 on proposal II. Also +1 on providing a "use the locale" option that's orthogonal to the type specifier. -- Greg

Antoine Pitrou wrote:
Greg Ewing <greg.ewing@...> writes:
I agree. I don't see a use case for hard-coding non-standard separators into every format string.
Sorry, but what do you call "non-standard" exactly?
I mean something other than "," and ".". My point is that while it's perfectly reasonable for, e.g. a French programmer to want to format his numbers with dots and commas the other way around, it's *not* reasonable to force him to tediously specify it in each and every format specifier he writes. There needs to be some way of setting it once for the whole program, otherwise it just won't be practical. -- Greg

Greg Ewing <greg.ewing@...> writes:
My point is that while it's perfectly reasonable for, e.g. a French programmer to want to format his numbers with dots and commas the other way around, it's *not* reasonable to force him to tediously specify it in each and every format specifier he writes.
A program often formatting numbers the same way can factor that into dedicated helpers: def format_float(f): return "{0:T.,2f}".format(f) or even: format_float = "{0:T.,2f}".format Regards Antoine.

Antoine Pitrou wrote:
A program often formatting numbers the same way can factor that into dedicated helpers:
If that's an acceptable thing to do on a daily basis, then we don't need format strings at all. -- Greg

Antoine Pitrou wrote:
Greg Ewing <greg.ewing@...> writes:
If that's an acceptable thing to do on a daily basis, then we don't need format strings at all.
Because you can do all your formatting by calling a function to format each number and then concatenating the results with whatever other text you want. You can do that now, but someone invented format strings, so they must have wanted a more convenient way of going about it. -- Greg

Greg Ewing <greg.ewing@...> writes:
You can do that now, but someone invented format strings, so they must have wanted a more convenient way of going about it.
I don't see how that contradicts what I said and you don't seem eager to produce understandable explanations, so I'll leave it there.

Greg Ewing wrote:
Antoine Pitrou wrote:
A program often formatting numbers the same way can factor that into dedicated helpers:
If that's an acceptable thing to do on a daily basis, then we don't need format strings at all.
Given that the helper functions *use* format strings, or could even be a method bound to a format string, that seems like an odd claim ;-).

Antoine Pitrou wrote:
Greg Ewing <greg.ewing@...> writes:
My point is that while it's perfectly reasonable for, e.g. a French programmer to want to format his numbers with dots and commas the other way around, it's *not* reasonable to force him to tediously specify it in each and every format specifier he writes.
A program often formatting numbers the same way can factor that into dedicated helpers:
def format_float(f): return "{0:T.,2f}".format(f)
or even:
format_float = "{0:T.,2f}".format
Or: float_fmt = 'T.,2f' then you can re-use it everywhere, and multiple times in a single .format() expression: '{0:{fmt}} {1:{fmt}}.format(3.14, 2.72, fmt=float_fmt) (Try that with %-formatting! :-) Or with a slight modification to the work I'm doing to implement auto-numbering: '{:{fmt}} {:{fmt}}'.format(3.14, 2.78, fmt=float_fmt) (but this is a different issue!) Eric.

On Thu, Mar 12, 2009 at 8:40 AM, Bruce Frederiksen <dangyogi@gmail.com> wrote:
Raymond Hettinger wrote:
James Knight observed that Indian/Pakistani numbering systems group by hundreds.
I'm not 100% sure here, but I believe that in India, they insert a separator after the first 3 digits, then another after 2 more digits, then every 3 digits after that (not sure if they use commas or periods, I think commas):
1,000,000,00,000
Not quite. I'm not Indian, but based off Wikipedia (http://en.wikipedia.org/wiki/Lakh): "after the first three digits, a comma divides every two rather than every three digits, thus: Indian system: 12,12,12,123 5,05,000 7,00,00,00,000" Cheers, Chris -- I have a blog: http://blog.rebertia.com

I vote we move ahead with Proposal II from PEP 378. I don't think there's anything else to add to the discussion. Eric.

Le Mon, 16 Mar 2009 12:51:24 -0400, Eric Smith <eric@trueblade.com> s'exprima ainsi:
I vote we move ahead with Proposal II from PEP 378. I don't think there's anything else to add to the discussion.
Eric.
Agree. denis ------ la vita e estrany

Guido, The conversation on the thousands separator seems to have wound down and the PEP has stabilized: http://www.python.org/dev/peps/pep-0378/ Please pronounce. Raymond ----- Original Message ----- From: "spir" <denis.spir@free.fr> To: <python-ideas@python.org> Sent: Monday, March 16, 2009 11:59 AM Subject: Re: [Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)
Le Mon, 16 Mar 2009 12:51:24 -0400, Eric Smith <eric@trueblade.com> s'exprima ainsi:
I vote we move ahead with Proposal II from PEP 378. I don't think there's anything else to add to the discussion.
Eric.
Agree.
denis ------ la vita e estrany _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

On Mon, Mar 16, 2009 at 12:19 PM, Raymond Hettinger <python@rcn.com> wrote:
The conversation on the thousands separator seems to have wound down and the PEP has stabilized: http://www.python.org/dev/peps/pep-0378/
Please pronounce.
That's not a PEP, it's just a summary of a discussion without any choice. :-) Typically PEPs put the discussion of alternatives in some section at the end, after the specification and other stuff relevant going forward. Just to add more fuel to the fire, did anyone propose refactoring the problem into (a) a way to produce output with a thousands separator, and (b) a way to localize formats? We could solve (a) by adding a comma to all numeric format languages along Nick's proposal, and we could solve (b) either now or later by adding some other flag that means "use locale-specific numeric formatting for this value". Or perhaps there could be two separate flags corresponding to the grouping and monetary arguments to locale.format(). I'd be happy to punt on (b) until later. This is somewhat analogous to the approach for strftime() which has syntax to invoke locale-specific formatting (%a, %A, %b, %B, %c, %p, %x, %X). I guess in the end this means I am in favor of Nick's alternative. One thing I don't understand: the PEP seems to exclude the 'e' and 'g' format. I would think that in case 'g' defers to 'f' it should act the same, and in case it defers to 'e', well, in the future (under (b) above) that could still change the period into a comma, right? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido van Rossum]
Typically PEPs put the discussion of alternatives in some section at the end, after the specification and other stuff relevant going forward.
Okay, re-arranged to make it more peppy.
I guess in the end this means I am in favor of Nick's alternative.
Was hoping you would be more attracted to the other proposal which more people's needs right out of the box. No matter what country you're in, it's nice to have the option to switch to spaces or underscores regardless of your local convention. In the end, most respondants seemed to support the more flexible version (Eric's proposal).
One thing I don't understand: the PEP seems to exclude the 'e' and 'g' format. I would think that in case 'g' defers to 'f' it should act the same, and in case it defers to 'e', well, in the future (under (b) above) that could still change the period into a comma, right?
Makes sense. So noted in the PEP. Raymond

Raymond Hettinger wrote:
I guess in the end this means I am in favor of Nick's alternative.
Was hoping you would be more attracted to the other proposal which more people's needs right out of the box. No matter what country you're in, it's nice to have the option to switch to spaces or underscores regardless of your local convention.
I actually prefer proposal II as well. It provides a decent quick solution for one-off scripts and debugging output, while leaving proper l10n/i18n support to the appropriate (heavier) tools. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Mon, Mar 16, 2009 at 3:24 PM, Raymond Hettinger <python@rcn.com> wrote:
[Guido van Rossum]
Typically PEPs put the discussion of alternatives in some section at the end, after the specification and other stuff relevant going forward.
Okay, re-arranged to make it more peppy.
I guess in the end this means I am in favor of Nick's alternative.
Was hoping you would be more attracted to the other proposal which more people's needs right out of the box. No matter what country you're in, it's nice to have the option to switch to spaces or underscores regardless of your local convention.
Your preference wasn't clear from the PEP. :-)
In the end, most respondants seemed to support the more flexible version (Eric's proposal).
Well, Python survived for about 19 years without having a way to override the decimal point *except* by using the locale module. I guess that divides our users in two classes: (1) Those for whom the default (C) locale is sufficient -- either because they live in the US (1a), or because they're used to programming languages US-centric approach (1b). (2) Those who absolutely need their numbers formatted for a locale -- either because they want to write heavy-duty localized code (2a), or because their locale doesn't use a comma and their end users would be upset to see US-formatted numbers (2b). For category (1), Nick's minimal proposal is good enough; someone in category (1b) who can live with a US-centric decimal point can also live with a US-centric thousands separator. For category (2a), Eric's proposal is not good enough. Which leaves category (2b), which must be pretty small because they've apparently put up with using the locale module anyways.
One thing I don't understand: the PEP seems to exclude the 'e' and 'g' format. I would think that in case 'g' defers to 'f' it should act the same, and in case it defers to 'e', well, in the future (under (b) above) that could still change the period into a comma, right?
Makes sense. So noted in the PEP.
On Mon, Mar 16, 2009 at 3:37 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I actually prefer proposal II as well. It provides a decent quick solution for one-off scripts and debugging output, while leaving proper l10n/i18n support to the appropriate (heavier) tools.
For debugging output and one-offs I don't think the period-vs-comma issue matters much; I'd expect those all to fall in category (1). Another way to look at it is: adding a thousands separator makes a *huge* difference for a large group of potential users, because interpreting numbers with more than 5 or 6 digits is very cumbersome otherwise. However adding a facility to specify a different character for the decimal point and for the separator only matters for a much smaller group of people (2b only), and IMO isn't worth the extra syntactic complexities. I would much rather add syntactic complexity to address a larger issue like (2a). I also have to say that I find Eric's proposal a bit ambiguous: why shouldn't {:8,d} mean "insert commas between thousands"? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

I also have to say that I find Eric's proposal a bit ambiguous: why shouldn't {:8,d} mean "insert commas between thousands"?
It does. That is the sixth example listed: format(1234, "8.1f") --> ' 1234.0' format(1234, "8,1f") --> ' 1234,0' format(1234, "8.,1f") --> ' 1.234,0' format(1234, "8 ,f") --> ' 1 234,0' format(1234, "8d") --> ' 1234' format(1234, "8,d") --> ' 1,234' format(1234, "8_d") --> ' 1_234' Raymond

On Mon, Mar 16, 2009 at 4:02 PM, Raymond Hettinger <python@rcn.com> wrote:
I also have to say that I find Eric's proposal a bit ambiguous: why shouldn't {:8,d} mean "insert commas between thousands"?
It does. That is the sixth example listed:
format(1234, "8.1f") --> ' 1234.0' format(1234, "8,1f") --> ' 1234,0' format(1234, "8.,1f") --> ' 1.234,0' format(1234, "8 ,f") --> ' 1 234,0' format(1234, "8d") --> ' 1234' format(1234, "8,d") --> ' 1,234' format(1234, "8_d") --> ' 1_234'
Argh! So "8,1f" means "use comma instead of point" wherease "8,1d" means "use comma as 1000 separator"? You guys can't seriously propose that. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Argh! So "8,1f" means "use comma instead of point" wherease "8,1d" means "use comma as 1000 separator"?
They both mean use the comma for the thousands separator. The decimal separator only gets overridden as part of the precision specification if provided: format(1234, "8,1f") --> ' 1234,0' Originally, I proposed prefixing the thousands separator with the letter T: format(1234, "8T,d") --> ' 1,234'. That made it crystal clear that the next character was the thousands separator. But people found it to be ugly and reacted badly. Eric then noticed that the T wasn't essential as long as the decimal separator is tightly associated with the precision specifier. If you find that to be screwy, then I guess Nick comma-only alternative wins. Or, there is an alternative that is a little more flexible. Make the thousands separator one of SPACE, UNDERSCORE, COMMA, or APOSTROPHE, leaving out the DOT which is reserved to be the sole decimal separator. That is unambiguous but doesn't help folks who want both a DOT thousands separator and COMMA decimal separator. Raymond

On Mon, Mar 16, 2009 at 4:25 PM, Raymond Hettinger <python@rcn.com> wrote:
Argh! So "8,1f" means "use comma instead of point" wherease "8,1d" means "use comma as 1000 separator"?
They both mean use the comma for the thousands separator. The decimal separator only gets overridden as part of the precision specification if provided: format(1234, "8,1f") --> ' 1234,0'
So I misread, but it is exceedingly subtle indeed: apparently if there's *one* special character it's the decimal point with 'f' and the thousands separator with 'd'; only 'f' supports *two* special characters and then the *first* one is the decimal point. The fact that we need so many emails to sort this out makes it clear that this proposal will lead to endless user confusion.
Originally, I proposed prefixing the thousands separator with the letter T: format(1234, "8T,d") --> ' 1,234'. That made it crystal clear that the next character was the thousands separator. But people found it to be ugly and reacted badly. Eric then noticed that the T wasn't essential as long as the decimal separator is tightly associated with the precision specifier.
If you find that to be screwy, then I guess Nick comma-only alternative wins.
Yes.
Or, there is an alternative that is a little more flexible. Make the thousands separator one of SPACE, UNDERSCORE, COMMA, or APOSTROPHE, leaving out the DOT which is reserved to be the sole decimal separator. That is unambiguous but doesn't help folks who want both a DOT thousands separator and COMMA decimal separator.
Right. Let's go ahead with Nick's proposal and put ways of specifying alternate separators (either via the locale or hardcoded) on the back burner Note that, unlike with the original % syntax, in .format() strings we can easily append extra syntax to the end. E.g. format(1234.5, "08,.1f;L"} could mean "use the locale", wherease format(1234.5, "08,.1f;T=_;D=,") could mean "use '_' for thousands, ',' for decimal point. But please, let's put this off and get Nick's simple proposal in first. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, Mar 16, 2009 at 7:33 PM, Raymond Hettinger <python@rcn.com> wrote:
Right. Let's go ahead with Nick's proposal and put ways of specifying alternate separators (either via the locale or hardcoded) on the back burner
Mark PEP 378 as accepted with Nick's original comma-only version?
OK, done. Looking forward to a swift implementation! -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
On Mon, Mar 16, 2009 at 7:33 PM, Raymond Hettinger <python@rcn.com> wrote:
Right. Let's go ahead with Nick's proposal and put ways of specifying alternate separators (either via the locale or hardcoded) on the back burner Mark PEP 378 as accepted with Nick's original comma-only version?
OK, done. Looking forward to a swift implementation!
I'm on it. Eric.

On Tue, Mar 17, 2009 at 3:14 AM, Guido van Rossum <guido@python.org> wrote:
On Mon, Mar 16, 2009 at 7:33 PM, Raymond Hettinger <python@rcn.com> wrote:
Mark PEP 378 as accepted with Nick's original comma-only version?
OK, done. Looking forward to a swift implementation!
I'll implement this for Decimal; it shouldn't take long. One question from the PEP, which I've been too slow to read until this morning: should commas appear in the zero-filled part of a number? That is, should format(1234, "09,d") give '00001,234' or '0,001,234'? The PEP specifies that format(1234, "08,d") should give '0001,234', but that's something of a special case: ',001,234' isn't really a viable alternative. Mark

[Mark]
One question from the PEP, which I've been too slow to read until this morning: should commas appear in the zero-filled part of a number?
I think it should. That lets all the commas and decimals line up vertically. Anything else would look weird.
for n in seq: ... print format(n, "09,d")
1,234,567 0,000,001 0,255,989 Raymond

Mark Dickinson wrote:
The PEP specifies that format(1234, "08,d") should give '0001,234', but that's something of a special case: ',001,234' isn't really a viable alternative.
Both of those look equally unviable to me. I don't think I'd ever use zero filling together with commas myself, as it looks decidedly weird, but if I had to pick a meaning for format(1234, "08,d") I think I would make it ' 001,234' the reasoning being that since a comma falls on the first position of an 8-char field, you can never put a digit there, and putting a comma at the beginning is no use. If there are more than 6 digits, then you get a comma plus an extra digit, making the field overflow to 9 characters, e.g. format(1234567, "08,d") gives '1,234,567' -- Greg

On Tue, Mar 17, 2009 at 11:04 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
[...] as it looks decidedly weird, but if I had to pick a meaning for format(1234, "08,d") I think I would make it
' 001,234'
Yes, that looks better than either of the alternatives I gave. I think I prefer that commas *do* appear in the zero padding, though as Eric says, it does add some extra complication to the code. In the case of the decimal code that complication is significant, mainly because of the need to figure out how much space is available for the zeros *before* doing the comma insertion. Mark

Mark Dickinson wrote:
On Tue, Mar 17, 2009 at 11:04 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
[...] as it looks decidedly weird, but if I had to pick a meaning for format(1234, "08,d") I think I would make it
' 001,234'
Yes, that looks better than either of the alternatives I gave.
I think I prefer that commas *do* appear in the zero padding, though as Eric says, it does add some extra complication to the code. In the case of the decimal code that complication is significant, mainly because of the need to figure out how much space is available for the zeros *before* doing the comma insertion.
If you look at _PyString_InsertThousandsGrouping, you'll see that it gets called twice. Once to compute the size, and once to actually do the inserting.

Mark Dickinson wrote:
On Tue, Mar 17, 2009 at 3:14 AM, Guido van Rossum <guido@python.org> wrote:
On Mon, Mar 16, 2009 at 7:33 PM, Raymond Hettinger <python@rcn.com> wrote:
Mark PEP 378 as accepted with Nick's original comma-only version? OK, done. Looking forward to a swift implementation!
I'll implement this for Decimal; it shouldn't take long.
One question from the PEP, which I've been too slow to read until this morning: should commas appear in the zero-filled part of a number? That is, should format(1234, "09,d") give '00001,234' or '0,001,234'? The PEP specifies that format(1234, "08,d") should give '0001,234', but that's something of a special case: ',001,234' isn't really a viable alternative.
Hmm. No good answers here. I'd vote for not putting the commas in the leading zeros. I don't think anyone would ever actually use this combination, and putting commas there complicates things due to the special case with the first digit. Plus, they're not inserted by the 'n' formatter, and no one has complained (which might mean no one's using it, of course). In 2.6:
import locale locale.setlocale(locale.LC_ALL, 'en_US.UTF8') 'en_US.UTF8' format(12345, '010n') '000012,345' format(12345, '09n') '00012,345' format(12345, '08n') '0012,345' format(12345, '07n') '012,345' format(12345, '06n') '12,345'

On Tue, Mar 17, 2009 at 11:15 AM, Eric Smith <eric@trueblade.com> wrote:
Hmm. No good answers here. I'd vote for not putting the commas in the leading zeros. I don't think anyone would ever actually use this combination, and putting commas there complicates things due to the special case with the first digit.
Plus, they're not inserted by the 'n' formatter, and no one has complained (which might mean no one's using it, of course).
But they *are* inserted by locale.format, and presumably no-one has complained about that either. :-)
format('%014f', 123.456, grouping=1) '0,000,123.456000'
It appears that locale.format adds the thousand separators after the fact, so the issue with the leading comma doesn't come up. That also means that the relationship between the field width (14 in this case) and the string length (16) is somewhat obscured. Mark

Mark Dickinson wrote:
On Tue, Mar 17, 2009 at 11:15 AM, Eric Smith <eric@trueblade.com> wrote:
Hmm. No good answers here. I'd vote for not putting the commas in the leading zeros. I don't think anyone would ever actually use this combination, and putting commas there complicates things due to the special case with the first digit.
Plus, they're not inserted by the 'n' formatter, and no one has complained (which might mean no one's using it, of course).
But they *are* inserted by locale.format, and presumably no-one has complained about that either. :-)
format('%014f', 123.456, grouping=1) '0,000,123.456000'
It appears that locale.format adds the thousand separators after the fact, so the issue with the leading comma doesn't come up. That also means that the relationship between the field width (14 in this case) and the string length (16) is somewhat obscured.
Ick. Presumably you specified a width because that's how wide you wanted the output to be! I still like leaving the commas out of leading zeros.

On Tue, Mar 17, 2009 at 6:24 AM, Eric Smith <eric@trueblade.com> wrote:
Mark Dickinson wrote:
On Tue, Mar 17, 2009 at 11:15 AM, Eric Smith <eric@trueblade.com> wrote:
Hmm. No good answers here. I'd vote for not putting the commas in the leading zeros. I don't think anyone would ever actually use this combination, and putting commas there complicates things due to the special case with the first digit.
Plus, they're not inserted by the 'n' formatter, and no one has complained (which might mean no one's using it, of course).
But they *are* inserted by locale.format, and presumably no-one has complained about that either. :-)
format('%014f', 123.456, grouping=1)
'0,000,123.456000'
It appears that locale.format adds the thousand separators after the fact, so the issue with the leading comma doesn't come up. That also means that the relationship between the field width (14 in this case) and the string length (16) is somewhat obscured.
Ick. Presumably you specified a width because that's how wide you wanted the output to be!
I still like leaving the commas out of leading zeros.
Ick, the discrepancy between the behavior of locale.format() and PEP 378 is unfortunate. I agree that the given width should include the commas, but I strongly feel that leading zeros should be comma-fied just like everything else. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Ick, the discrepancy between the behavior of locale.format() and PEP 378 is unfortunate. I agree that the given width should include the commas, but I strongly feel that leading zeros should be comma-fied just like everything else.
And what happens when the comma would be the first character? ,012,345 0012,345 or something else?

On Tue, Mar 17, 2009 at 2:58 PM, Eric Smith <eric@trueblade.com> wrote:
And what happens when the comma would be the first character?
,012,345 0012,345
or something else?
Options are: (A) ",012,345" (B) "0012,345" (C) " 012,345" (D) "0,012,345" (E) write-in option here I vote for (D): it's one character too large, but the given precision is only supposed to be a minimum anyway. We already end up with a length-9 string when formatting 1234567. (D) is the minimum width string that: doesn't look weird (like (A) and (B)), has length at least 8, and is still in the right basic format (C) would be my second choice, but I find the extra space padding to be somewhat arbitrary (why a space? why not some other padding character?) Mark

Le Tue, 17 Mar 2009 15:12:20 +0000, Mark Dickinson <dickinsm@gmail.com> s'exprima ainsi:
On Tue, Mar 17, 2009 at 2:58 PM, Eric Smith <eric@trueblade.com> wrote:
And what happens when the comma would be the first character?
,012,345 0012,345
or something else?
Options are:
(A) ",012,345" (B) "0012,345" (C) " 012,345" (D) "0,012,345" (E) write-in option here
I vote for (D): it's one character too large, but the given precision is only supposed to be a minimum anyway. We already end up with a length-9 string when formatting 1234567.
(D) is the minimum width string that: doesn't look weird (like (A) and (B)), has length at least 8, and is still in the right basic format
(C) would be my second choice, but I find the extra space padding to be somewhat arbitrary (why a space? why not some other padding character?)
I agree with all the comments above. * A is ... (censured). * B does not comply with user choice. * D is the best in theory, but would trouble table-like vertical alignment. * So remains only C for me. Also, the issue here comes from user inconsistency: a (total) width of 8 simply cannot fit with group separators every 3 digits (warning?). At best, there should be some information on this topic to avoid bad surprises, but then the implementation should not care much.
Mark
Denis ------ la vita e estrany

On Tue, Mar 17, 2009 at 8:12 AM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Mar 17, 2009 at 2:58 PM, Eric Smith <eric@trueblade.com> wrote:
And what happens when the comma would be the first character?
,012,345 0012,345
or something else?
Options are:
(A) ",012,345" (B) "0012,345"
Neither (A) nor (B) is acceptable.
(C) " 012,345" (D) "0,012,345" (E) write-in option here
I vote for (D): it's one character too large, but the given precision is only supposed to be a minimum anyway. We already end up with a length-9 string when formatting 1234567.
(D) is the minimum width string that: doesn't look weird (like (A) and (B)), has length at least 8, and is still in the right basic format
(C) would be my second choice, but I find the extra space padding to be somewhat arbitrary (why a space? why not some other padding character?)
It's tough to choose between (C) and (D). I guess we'll have to look at use cases for leading zeros. I can think of two use cases for leading zeros are: (1) To avoid font-width issues -- many variable-width fonts are designed so that all digits have the same width, but their (default) space is much narrower. (2) To avoid fraud when printing certain documents -- it's easier to insert a '1' in front of a small number than to change a '0' into something else. Since both use cases are trying to avoid spaces, I think (D) is the winner here. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
(1) To avoid font-width issues -- many variable-width fonts are designed so that all digits have the same width, but their (default) space is much narrower.
That's a good point. This alone doesn't necessarily rule out (A), though. It could be considered a case of user stupidity if they specify a field width that results in a comma at the beginning and don't like the result. It doesn't necessarily rule out (C) either, since there will always be a space at the beginning unless the value overflows, and then all your alignment guarantees are blown away anyhow. (2) To avoid fraud
when printing certain documents -- it's easier to insert a '1' in front of a small number than to change a '0' into something else.
However it's easy to add a '1' before a string of leading zeroes if there's a sliver of space available, so it's better still to fill with some other character such as '*'. You need a cooperative font for that to work. -- Greg

On Tue, Mar 17, 2009 at 2:41 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
(1) To avoid font-width issues -- many variable-width fonts are designed so that all digits have the same width, but their (default) space is much narrower.
That's a good point.
This alone doesn't necessarily rule out (A), though. It could be considered a case of user stupidity if they specify a field width that results in a comma at the beginning and don't like the result.
(A) is ruled out on the basis of aesthetics alone.
It doesn't necessarily rule out (C) either, since there will always be a space at the beginning unless the value overflows, and then all your alignment guarantees are blown away anyhow.
(2) To avoid fraud
when printing certain documents -- it's easier to insert a '1' in front of a small number than to change a '0' into something else.
However it's easy to add a '1' before a string of leading zeroes if there's a sliver of space available, so it's better still to fill with some other character such as '*'. You need a cooperative font for that to work.
What I've seen is the '$' sign immediately in front, e.g. $001,000.00. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
(2) To avoid fraud
when printing certain documents -- it's easier to insert a '1' in front of a small number than to change a '0' into something else. However it's easy to add a '1' before a string of leading zeroes if there's a sliver of space available, so it's better still to fill with some other character such as '*'. You need a cooperative font for that to work.
What I've seen is the '$' sign immediately in front, e.g. $001,000.00.
I think I'd rather see something like: $==1,000.00== I wouldn't use zeroes, if I were the bank. It is bad on the aesthetics, and too easy to fraud.

On Wed, 18 Mar 2009 01:42:27 pm Lie Ryan wrote:
Guido van Rossum wrote:
(2) To avoid fraud
when printing certain documents -- it's easier to insert a '1' in front of a small number than to change a '0' into something else.
However it's easy to add a '1' before a string of leading zeroes if there's a sliver of space available, so it's better still to fill with some other character such as '*'. You need a cooperative font for that to work.
What I've seen is the '$' sign immediately in front, e.g. $001,000.00.
I think I'd rather see something like: $==1,000.00==
I wouldn't use zeroes, if I were the bank. It is bad on the aesthetics, and too easy to fraud.
What I've generally seen on cheques is $****1,000.00 -- Steven D'Aprano

Steven D'Aprano wrote:
What I've generally seen on cheques is $****1,000.00
Interestingly, str.format will actually be able to produce directly in 3.1: "${:*>,.2f}".format(value) ...although that makes seq[::-1] look positively coherent :) Wondering-who-will-ask-for-a-{!verbose}-string-formatting-flag'ly, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

What I've generally seen on cheques is $****1,000.00
Interestingly, str.format will actually be able to produce directly in 3.1:
"${:*>,.2f}".format(value)
What we have already in SVN courtesy of Mark Dickinson:
from decimal import Decimal value = Decimal(1000) "${:*>12,.2f}".format(value) '$****1,000.00'
Raymond

Raymond Hettinger wrote:
What I've generally seen on cheques is $****1,000.00
Interestingly, str.format will actually be able to produce directly in 3.1:
"${:*>,.2f}".format(value)
What we have already in SVN courtesy of Mark Dickinson:
from decimal import Decimal value = Decimal(1000) "${:*>12,.2f}".format(value) '$****1,000.00'
Anything but zeroes that isn't too similar to numeric character should be fine for "finance-related number". PS: On this side of the world, the commas and the dots are reversed so I would not dream any solution that doesn't encompass at least that (which doesn't require additional function wrapping). I'd personally prefer fully customizable separator, as my personal preference is using space and decimal commas PPS: I HAVE A HISTORY OF BEING ADMITTED TO A MENTAL INSTITUTION AFTER SEEING NUMBERS WITH COMMAS USED AS THOUSAND SEPARATOR. PPPS: The next statement is a lie. PPPPS: The mental institution thing is true.

Lie Ryan wrote:
Anything but zeroes that isn't too similar to numeric character should be fine for "finance-related number".
PS: On this side of the world, the commas and the dots are reversed so I would not dream any solution that doesn't encompass at least that (which doesn't require additional function wrapping). I'd personally prefer fully customizable separator, as my personal preference is using space and decimal commas PPS: I HAVE A HISTORY OF BEING ADMITTED TO A MENTAL INSTITUTION AFTER SEEING NUMBERS WITH COMMAS USED AS THOUSAND SEPARATOR. PPPS: The next statement is a lie. PPPPS: The mental institution thing is true. PPPPPS: The first postscript includes financial institution PPPPPPS: The fact that you can wrap the formatting in function call is not an excuse for not providing fully customizable separators. PPPPPPPS: The financial world != American financial institutions

Lie Ryan <lie.1296@...> writes:
PPS: I HAVE A HISTORY OF BEING ADMITTED TO A MENTAL INSTITUTION AFTER SEEING NUMBERS WITH COMMAS USED AS THOUSAND SEPARATOR. PPPS: The next statement is a lie. PPPPS: The mental institution thing is true.
I am fully sympathetic.
PPPPPPPS: The financial world != American financial institutions
Agreed, but they have the largest debts. Therefore, real-life examples of commas used as thousands separators should include a negative sign.

Antoine Pitrou wrote:
Agreed, but they have the largest debts. Therefore, real-life examples of commas used as thousands separators should include a negative sign.
A. :) B. All I can suggest is to try to think of the "commas as separators in format()" situation as being in the same vein as that whole "let use English keywords where possible" idea :) Hopefully a way will be found to provide a less English-centric but still easy to use formatting system eventually, but in the meantime Python *is* a language that looks like English pseudocode... Cheers, Nick -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Le Thu, 19 Mar 2009 08:33:43 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:
B. All I can suggest is to try to think of the "commas as separators in format()" situation as being in the same vein as that whole "let use English keywords where possible" idea :)
This is a wrong rationale. The readers of python keywords is the community of pythonistas (*); while the readers of documents produced by apps written in python can be any kind of people. "1,234,567.89" is more or less illegible for people not used to english conventions. Specifying the separator(s) is definitely a bad idea imo. I have not understood the proposal to be intended only for debug, but for all kinds of quick and/or unpublished developpment. Even in the first case, having numbers output in the format your eyes are used to is a nice & worthful help. Imagine you -- and all programmers, and millions of users -- would have to cope with numbers like "1.234.567,89" all the time only because someone decided (for any reason) that separators must be fixed, and this format is the obvious one. Denis (*) ditto about english naming, comments, & doc inside standard library ------ la vita e estrany

On Thu, 19 Mar 2009 08:12:20 pm spir wrote:
Le Thu, 19 Mar 2009 08:33:43 +1000,
Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:
B. All I can suggest is to try to think of the "commas as separators in format()" situation as being in the same vein as that whole "let use English keywords where possible" idea :)
This is a wrong rationale. The readers of python keywords is the community of pythonistas (*); while the readers of documents produced by apps written in python can be any kind of people. "1,234,567.89" is more or less illegible for people not used to english conventions. Specifying the separator(s) is definitely a bad idea imo. I have not understood the proposal to be intended only for debug, but for all kinds of quick and/or unpublished developpment. Even in the first case, having numbers output in the format your eyes are used to is a nice & worthful help. Imagine you -- and all programmers, and millions of users -- would have to cope with numbers like "1.234.567,89" all the time only because someone decided (for any reason) that separators must be fixed, and this format is the obvious one.
It would be sub-optimal but hardly "more or less illegible". But then I'm not American and therefore I'm already used to people misspelling colour as "color", centre as "center", and biscuit as "cookie" *wink* Nevertheless, I agree that for output, we shouldn't hard-code the decimal and thousands separator as "." and "," respectively -- although as an English-speaker, I'd be happy for those choices to be the default. But surely with Raymond and Mark's idea about passing a dict derived from locale, this is no longer an issue? Are hard-coded separators still on the table? -- Steven D'Aprano

Steven D'Aprano wrote:
But surely with Raymond and Mark's idea about passing a dict derived from locale, this is no longer an issue? Are hard-coded separators still on the table?
That's a separate discussion, not part of PEP 377. The comma in PEP 377 is hardcoded, just like the decimal point. If formatting becomes more configurable it will be via a new PEP. What I don't get here is that anyone writing "quick and dirty" scripts that still needed locale appropriate output appropriate for non-developer end users* already couldn't use %-formatting or str.format for the task. The decimal point was wrong and there was no way at all to insert a thousands separator. If it's only a matter of localisation, then the locale module can do the job and the affected developers are probably already using it. If it's a matter of internationalisation, then that involves a lot more than just a comma here and there, and again, affected developers will already be using an appropriate tool. The PEP provides a quick way to make big numbers more readable when the intended audience is either the developer themselves (i.e. debugging messages), or an audience of IT types (e.g. system administrators). Yes, it is inadequate in many situations for formatting strings for display to non-developer end users - that isn't a new problem, and PEP 377 doesn't make it any worse than it already was. Cheers, Nick. *(Note that such scripts actually sound neither quick nor dirty to me - as soon as you're producing output for non-developers you have to pay far more attention to the formatting and other presentation aspects, whether those readers are native English speakers or not) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Eric Smith wrote:
That's a separate discussion, not part of PEP 377. The comma in PEP 377 is hardcoded, just like the decimal point. If formatting becomes more configurable it will be via a new PEP.
For the record, it's PEP 378.
Sorry about that - got my PEP numbers mixed up (377 is floating around in my brain since I still have to update it with Guido's rejection). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

spir <denis.spir@free.fr> wrote:
Le Thu, 19 Mar 2009 08:33:43 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:
B. All I can suggest is to try to think of the "commas as separators in format()" situation as being in the same vein as that whole "let use English keywords where possible" idea :)
This is a wrong rationale. The readers of python keywords is the community of pythonistas (*); while the readers of documents produced by apps written in python can be any kind of people. "1,234,567.89" is more or less illegible
But the thing currently approved, using ',' to indicated that thousands separators should be used, is _exactly_ like the keyword situation. It's something that the programmer types and reads. Controlling what character actually gets used in the output is a separate issue that still needs to be addressed, to my understanding. For now, we are defaulting to English, just like usual ;) -- R. David Murray http://www.bitdance.com

Guido van Rossum wrote:
I agree that the given width should include the commas, but I strongly feel that leading zeros should be comma-fied just like everything else.
I think we need some use cases before a proper decision can be made about this. If you were using comma-separated zero-filled numbers, what would your objective be, and what choice would best fulfill it? -- Greg

[Guido van Rossum]
I agree that the given width should include the commas, but I strongly feel that leading zeros should be comma-fied just like everything else.
+1 [Greg Ewing]
I think we need some use cases before a proper decision can be made about this. If you were using comma-separated zero-filled numbers, what would your objective be, and what choice would best fulfill it?
I gave one example of writing out numbers in columns and that makes it clear that putting commas in the leading zeros is the right thing to do (anything else looks unusably weird). Also, as Guido pointed-out, anyone specifying zero-padding is saying that they intend to not be showing spaces where digits would go. Our choice ought to respect that intention. Raymond

Le Wed, 18 Mar 2009 08:52:20 +1200, Greg Ewing <greg.ewing@canterbury.ac.nz> s'exprima ainsi:
Guido van Rossum wrote:
I agree that the given width should include the commas, but I strongly feel that leading zeros should be comma-fied just like everything else.
I think we need some use cases before a proper decision can be made about this. If you were using comma-separated zero-filled numbers, what would your objective be, and what choice would best fulfill it?
I think the point is just this: 0,000,000.89 1,234,567.89 looks right. 0000000.89 1,234,567.89 looks wrong. 0000000.89 1,234,567.89 looks wrong. 000000000.89 1,234,567.89 looks wrong. ------ la vita e estrany

Mark Dickinson wrote:
format('%014f', 123.456, grouping=1)
'0,000,123.456000'
That also means that the relationship between the field width (14 in this case) and the string length (16) is somewhat obscured.
I'd consider that part a bug that we shouldn't imitate. The field width should always be what you say it is, unless the value is too big to fit. -- Greg

Greg Ewing wrote:
Mark Dickinson wrote:
format('%014f', 123.456, grouping=1)
'0,000,123.456000'
That also means that the relationship between the field width (14 in this case) and the string length (16) is somewhat obscured.
I'd consider that part a bug that we shouldn't imitate. The field width should always be what you say it is, unless the value is too big to fit.
Should there be an option for using hard-width? If hard-width flag is on, then if the value is too big to fit, then the number will get trimmed instead of changing the width (and perhaps there would be prepend character). So: width: 4, number: 123456, ppchar "<456" So not to break table alignment...

Eric Smith wrote:
Mark Dickinson wrote:
One question from the PEP, which I've been too slow to read until this morning: should commas appear in the zero-filled part of a number? That is, should format(1234, "09,d") give '00001,234' or '0,001,234'? The PEP specifies that format(1234, "08,d") should give '0001,234', but that's something of a special case: ',001,234' isn't really a viable alternative.
Hmm. No good answers here. I'd vote for not putting the commas in the leading zeros. I don't think anyone would ever actually use this combination, and putting commas there complicates things due to the special case with the first digit.
Plus, they're not inserted by the 'n' formatter, and no one has complained (which might mean no one's using it, of course).
In 2.6:
import locale locale.setlocale(locale.LC_ALL, 'en_US.UTF8') 'en_US.UTF8' format(12345, '010n') '000012,345' format(12345, '09n') '00012,345' format(12345, '08n') '0012,345' format(12345, '07n') '012,345' format(12345, '06n') '12,345'
I think this is a bug that should be fixed in the same way we implement it for PEP 378. It's more complex for 'n', because you might have funny groupings (like very 3, then 2). But I hope our solution for PEP 378 will generalize to this case, too. Eric.

Guido van Rossum wrote:
On Mon, Mar 16, 2009 at 12:19 PM, Raymond Hettinger <python@rcn.com> wrote:
The conversation on the thousands separator seems to have wound down and the PEP has stabilized: http://www.python.org/dev/peps/pep-0378/
Please pronounce.
That's not a PEP, it's just a summary of a discussion without any choice. :-)
I hope Raymond can understand this. To me, the choice presented is to add the Main Proposal syntax extension, or not. Typically PEPs put the discussion of alternatives in some
section at the end, after the specification and other stuff relevant going forward.
You want more alternatives than the Nick's Alternative Proposal, discussed at the end? I believe most of the other ideas on the list were directed at some sense of (b) below.
Just to add more fuel to the fire, did anyone propose refactoring the problem into (a) a way to produce output with a thousands separator, and (b) a way to localize formats?
Since a way to produce output with a choice of thousands separators is a necessary part of a way localize formats, I am not sure of what distinction you are trying to draw. 'Localize formats' has two quite distinct meanings: 'format this number in a particular way (which can vary from number to number or at least user to user)' versus 'format all numbers according to a particular national standard'.
We could solve (a) by adding a comma to all numeric format languages along Nick's proposal,
Raymond current proposal, based on discussion, is to offer users a choice of 5 chars as thousands separators (and allow a choice of decimal separator). Nick's proposal is to only offer comma as thousands separator. While the latter meets my current parochial needs, I favor the more inclusive approach.
and we could solve (b) either now
Raymond's main proposal partially solves that now (which is to say, completely solves than now for most of the world) in the first sense I gave for (b), on a case-by-case basis.
or later by adding some other flag that means "use locale-specific numeric formatting for this value".
As I understand from Raymond's introductory comments and those in the locale module docs, the global C locale setting is not intended to be changed on an output-by-output basis. Hence, while useful for nationalizing software, it is not so useful for individualized output from global software.
perhaps there could be two separate flags corresponding to the grouping and monetary arguments to locale.format().
The flags just say to use the global locale settings, which have the limitations indicated above. Raymond's proposal is that a Python programmer should be better able to say "Format this number how I (or a particular user) want it to be formatted, regardless of the 'locale' setting".
I'd be happy to punt on (b) until later.
This is somewhat analogous to the approach for strftime() which has syntax to invoke locale-specific formatting (%a, %A, %b, %B, %c, %p, %x, %X).
With the attendant pluses and minuses.
I guess in the end this means I am in favor of Nick's alternative.
I fail to see how this follows from your previous comments.
One thing I don't understand: the PEP seems to exclude the 'e' and 'g' format.
Both proposals claim to include e and g. However, since thousands separators only apply to the left of the decimal point, and e notation only has one digit to the left, no thousands separator proposal will apply the e (and g when it produces e). The only known separator used to the left is a space, typically in groups of 5 digits, in some math tables. The decimal separator part of the PEP *does* apply to e and g.
I would think that in case 'g' defers to 'f' it should act the same, and in case it defers to 'e', well, in the future (under (b) above) that could still change the period into a comma, right?
With the main proposal, one could simply specify, for instance, '8,1f' instead of '8.1f' to make that change *now*. I consider that much better than post-processing, which Nick's alternative would continue to require, and which gets worse with thousands separators added. Terry Jan Reedy

Concerning the difficulty of exchanging "." and "," by post-processing, it might be generally useful to have a swap(s1, s2) method on strings that would replace occurrences of s1 by s2 and vice versa. -- Greg

Greg Ewing wrote:
Concerning the difficulty of exchanging "." and "," by post-processing, it might be generally useful to have a swap(s1, s2) method on strings that would replace occurrences of s1 by s2 and vice versa.
I would appreciate having that. There are a lot of small jobs where str.translate and re are overkill, but s.replace(s1, TEMPCHAR); is awkward, since you're not sure what you can safely use as a tempchar. -- Carl Johnson

Our emails crossed. On Mon, Mar 16, 2009 at 5:01 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Guido van Rossum wrote:
On Mon, Mar 16, 2009 at 12:19 PM, Raymond Hettinger <python@rcn.com> wrote:
The conversation on the thousands separator seems to have wound down and the PEP has stabilized: http://www.python.org/dev/peps/pep-0378/
Please pronounce.
That's not a PEP, it's just a summary of a discussion without any choice. :-)
I hope Raymond can understand this. To me, the choice presented is to add the Main Proposal syntax extension, or not.
Typically PEPs put the discussion of alternatives in some
section at the end, after the specification and other stuff relevant going forward.
You want more alternatives than the Nick's Alternative Proposal, discussed at the end? I believe most of the other ideas on the list were directed at some sense of (b) below.
Just to add more fuel to the fire, did anyone propose refactoring the problem into (a) a way to produce output with a thousands separator, and (b) a way to localize formats?
Since a way to produce output with a choice of thousands separators is a necessary part of a way localize formats, I am not sure of what distinction you are trying to draw.
'Localize formats' has two quite distinct meanings: 'format this number in a particular way (which can vary from number to number or at least user to user)' versus 'format all numbers according to a particular national standard'.
We could solve (a) by adding a
comma to all numeric format languages along Nick's proposal,
Raymond current proposal, based on discussion, is to offer users a choice of 5 chars as thousands separators (and allow a choice of decimal separator). Nick's proposal is to only offer comma as thousands separator. While the latter meets my current parochial needs, I favor the more inclusive approach.
and we could solve (b) either now
Raymond's main proposal partially solves that now (which is to say, completely solves than now for most of the world) in the first sense I gave for (b), on a case-by-case basis.
or later by adding some other flag that
means "use locale-specific numeric formatting for this value".
As I understand from Raymond's introductory comments and those in the locale module docs, the global C locale setting is not intended to be changed on an output-by-output basis. Hence, while useful for nationalizing software, it is not so useful for individualized output from global software.
perhaps there could be two separate flags corresponding to the grouping and monetary arguments to locale.format().
The flags just say to use the global locale settings, which have the limitations indicated above. Raymond's proposal is that a Python programmer should be better able to say "Format this number how I (or a particular user) want it to be formatted, regardless of the 'locale' setting".
I'd be happy to punt on (b) until later.
This is somewhat analogous to the approach for strftime() which has syntax to invoke locale-specific formatting (%a, %A, %b, %B, %c, %p, %x, %X).
With the attendant pluses and minuses.
I guess in the end this means I am in favor of Nick's alternative.
I fail to see how this follows from your previous comments.
One thing I don't understand: the PEP seems to exclude the 'e' and 'g' format.
Both proposals claim to include e and g. However, since thousands separators only apply to the left of the decimal point, and e notation only has one digit to the left, no thousands separator proposal will apply the e (and g when it produces e). The only known separator used to the left is a space, typically in groups of 5 digits, in some math tables. The decimal separator part of the PEP *does* apply to e and g.
I would think that in case 'g' defers to 'f' it should act the same, and in case it defers to 'e', well, in the future (under (b) above) that could still change the period into a comma, right?
With the main proposal, one could simply specify, for instance, '8,1f' instead of '8.1f' to make that change *now*. I consider that much better than post-processing, which Nick's alternative would continue to require, and which gets worse with thousands separators added.
Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, Mar 16, 2009 at 4:45 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Eric Smith <eric@trueblade.com> s'exprima ainsi:
I vote we move ahead with Proposal II from PEP 378.
Looks fairly good to me.
Of course this is by now ambiguous -- the latest version of the PEP no longer numbers the versions I and II, and has Nick's version second. (Which may be reversed by the time you read this if Raymond keeps updating the PEP in real time. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

I vote we move ahead with Proposal II from PEP 378.
Looks fairly good to me.
Of course this is by now ambiguous -- the latest version of the PEP no longer numbers the versions I and II, and has Nick's version second. (Which may be reversed by the time you read this if Raymond keeps updating the PEP in real time. :-)
To keep the conversation in sync with today's real-time updates, I've put back in the "perma-names", Proposal I (nick's) and Proposal II (eric's). Raymond

Guido van Rossum wrote:
Of course this is by now ambiguous -- the latest version of the PEP no longer numbers the versions I and II
To be clear, I'm in favour of Nick's version. (I share your concern about the apparent ambiguities in Eric's version -- it confused me too the first few times I read it!) -- Greg

Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
To be clear, I'm in favour of Nick's version.
(I share your concern about the apparent ambiguities in Eric's version -- it confused me too the first few times I read it!)
I'll chime in in favor of the simpler proposal and leaving the 'specify what characters to use' ability for later. That's the way I've felt from the beginning of the discussion, for what it's worth. It feels like the factoring Guido talked about ("yes I want thousands separators" and then separately "here's what I want to use for thousands/decimal separators") is the correct way to break down the problem. -- R. David Murray http://www.bitdance.com
participants (19)
-
Antoine Pitrou
-
Bruce Frederiksen
-
Bruce Leban
-
Carl Johnson
-
Chris Rebert
-
Eric Smith
-
Greg Ewing
-
Guido van Rossum
-
Jim Jewett
-
Lie Ryan
-
Mark Dickinson
-
Mathias Panzenböck
-
Nick Coghlan
-
R. David Murray
-
Raymond Hettinger
-
Scott David Daniels
-
spir
-
Steven D'Aprano
-
Terry Reedy