[Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

Fri Mar 13 12:46:32 CET 2009

spir wrote:
> Le Thu, 12 Mar 2009 19:31:39 -0400,
> Eric Smith <eric at trueblade.com> s'exprima ainsi:
> 
>>> I've always thought that we should have a utility function which formats 
>>> a number based on the same settings that are in the locale, but not 
>>> actually use the locale. Something like:
>>>
>>> format_number(123456787654321.123, decimal_point=',', thousands_sep=' ',
>>>               grouping=[4, 3, 2])  
>>>  >>> '12 34 56 78 765 4321,123'  
>> To be maximally useful (for example, so it could be used in Decimal to 
>> implement locale formatting), maybe it should work on strings:
>>
>>  >>> format_number(whole_part='123456787654321',  
>>                fractional_part='123',
>>                decimal_point=',',
>>                thousands_sep=' ',
>>                grouping=[4, 3, 2])
>>  >>> '12 34 56 78 765 4321,123'  
>>
>>  >>> format_number(whole_part='123456787654321',  
>>                decimal_point=',',
>>                thousands_sep='.',
>>                grouping=[4, 3, 2])
>>  >>> '12.34.56.78.765.4321'  
>  
> 
> I find the overall problem of providing an interface to specify a number format rather challenging. The issue I see is to design a formatting pattern that is simple, clear, _and_ practicle. A practicle pattern is easy to specify, but then it becomes rather illegible and/or hard to remember, while a legible one ends up excessively verbose. 
> 
> I have the impression, but I may well be wrong, that contrarily to a format, a *formatted number* instead seems easy to scan -- with human eyes. So, as a crazy idea, I wonder whether we shouldn't let the user provide a example formatted number instead. This may address most of use cases, but probably not all.
> 
> To makes things easier, why not specify a canonical number, such as '-123456.789', of which the user should define the formatted version? Then a smart parser could deduce the format to be applied to further numbers. Below a purely artificial example.
> 
> -123456.789   -->   kg 00_123_456,79-
> 
> format:
>    unit: 'kg'
>    unit_pos: LEFT
>    unit_sep: ' '
>    thousand_sep: '_'
>    fract_sep : ','
>    sign_pos: RIGHT
>    sign_sep: None
>    padding_char: '0'
> 
> There are obvious issues:
> * Does rouding apply to whole precision (number of significative digits), or to the fractional part only? Then, should the format be interpreted as the most common case (probably fract. rounding), provide a disambiguation flag, provide a flag for non-default case only? What if rounding applies after a big number of digits? Should we instead allow the user providing a longer number?
> * Similar for padding: does it apply to the length of the whole number or to the integral part (common in financial apps to align decimal signs). What if the  padding applies to a smaller number of digits than the one of the canonical number. Should we instead allow the user providing a shorter number?
> * probably more...
> 
> The space of valid formats can be specified using a parsing grammar, so that a parse failure indicates invalid format, and a "tagged" parse tree provides all the information needed to construct a format object.
> 
> Really do not know whether this idea is stupid or worth beeing explored ;-) [But I would well try it for personal use. At least as everyday-fast-and-easy feature.]

Your proposal (other than being harder to implement), is similar to the 
way Excel handled formatting, but instead of sample number, they uses # 
for placeholder. If you really want to test-implement it, better try 
using that.

And I think it is impossible for the parser to be that smart to 
recognize that sign pos should be put in the rear (the smartest parser 
might only treat it as literal negative). Also it is highly inflexible, 
what about custom positive sign? What if I want to use literal -? What 
about literal number? What about non-latin number?