[Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

spir denis.spir at free.fr
Fri Mar 13 11:56:02 CET 2009


Le Thu, 12 Mar 2009 19:31:39 -0400,
Eric Smith <eric at trueblade.com> s'exprima ainsi:

> > I've always thought that we should have a utility function which formats 
> > a number based on the same settings that are in the locale, but not 
> > actually use the locale. Something like:
> > 
> > format_number(123456787654321.123, decimal_point=',', thousands_sep=' ',
> >               grouping=[4, 3, 2])  
> >  >>> '12 34 56 78 765 4321,123'  
> 
> To be maximally useful (for example, so it could be used in Decimal to 
> implement locale formatting), maybe it should work on strings:
> 
>  >>> format_number(whole_part='123456787654321',  
>                fractional_part='123',
>                decimal_point=',',
>                thousands_sep=' ',
>                grouping=[4, 3, 2])
>  >>> '12 34 56 78 765 4321,123'  
> 
>  >>> format_number(whole_part='123456787654321',  
>                decimal_point=',',
>                thousands_sep='.',
>                grouping=[4, 3, 2])
>  >>> '12.34.56.78.765.4321'  
 

I find the overall problem of providing an interface to specify a number format rather challenging. The issue I see is to design a formatting pattern that is simple, clear, _and_ practicle. A practicle pattern is easy to specify, but then it becomes rather illegible and/or hard to remember, while a legible one ends up excessively verbose. 

I have the impression, but I may well be wrong, that contrarily to a format, a *formatted number* instead seems easy to scan -- with human eyes. So, as a crazy idea, I wonder whether we shouldn't let the user provide a example formatted number instead. This may address most of use cases, but probably not all.

To makes things easier, why not specify a canonical number, such as '-123456.789', of which the user should define the formatted version? Then a smart parser could deduce the format to be applied to further numbers. Below a purely artificial example.

-123456.789   -->   kg 00_123_456,79-

format:
   unit: 'kg'
   unit_pos: LEFT
   unit_sep: ' '
   thousand_sep: '_'
   fract_sep : ','
   sign_pos: RIGHT
   sign_sep: None
   padding_char: '0'

There are obvious issues:
* Does rouding apply to whole precision (number of significative digits), or to the fractional part only? Then, should the format be interpreted as the most common case (probably fract. rounding), provide a disambiguation flag, provide a flag for non-default case only? What if rounding applies after a big number of digits? Should we instead allow the user providing a longer number?
* Similar for padding: does it apply to the length of the whole number or to the integral part (common in financial apps to align decimal signs). What if the  padding applies to a smaller number of digits than the one of the canonical number. Should we instead allow the user providing a shorter number?
* probably more...

The space of valid formats can be specified using a parsing grammar, so that a parse failure indicates invalid format, and a "tagged" parse tree provides all the information needed to construct a format object.

Really do not know whether this idea is stupid or worth beeing explored ;-) [But I would well try it for personal use. At least as everyday-fast-and-easy feature.]

Denis


------
la vita e estrany



More information about the Python-ideas mailing list