[Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

spir denis.spir at free.fr
Fri Mar 13 13:20:23 CET 2009


Le Fri, 13 Mar 2009 22:46:32 +1100,
Lie Ryan <lie.1296 at gmail.com> s'exprima ainsi:

> spir wrote:
> > Le Thu, 12 Mar 2009 19:31:39 -0400,
> > Eric Smith <eric at trueblade.com> s'exprima ainsi:
> > 
> >>> I've always thought that we should have a utility function which
> >>> formats a number based on the same settings that are in the locale, but
> >>> not actually use the locale. Something like:
> >>>
> >>> format_number(123456787654321.123, decimal_point=',', thousands_sep=' ',
> >>>               grouping=[4, 3, 2])  
> >>>  >>> '12 34 56 78 765 4321,123'  
> >> To be maximally useful (for example, so it could be used in Decimal to 
> >> implement locale formatting), maybe it should work on strings:
> >>
> >>  >>> format_number(whole_part='123456787654321',  
> >>                fractional_part='123',
> >>                decimal_point=',',
> >>                thousands_sep=' ',
> >>                grouping=[4, 3, 2])
> >>  >>> '12 34 56 78 765 4321,123'  
> >>
> >>  >>> format_number(whole_part='123456787654321',  
> >>                decimal_point=',',
> >>                thousands_sep='.',
> >>                grouping=[4, 3, 2])
> >>  >>> '12.34.56.78.765.4321'  
> >  
> > 
> > I find the overall problem of providing an interface to specify a number
> > format rather challenging. The issue I see is to design a formatting
> > pattern that is simple, clear, _and_ practicle. A practicle pattern is
> > easy to specify, but then it becomes rather illegible and/or hard to
> > remember, while a legible one ends up excessively verbose. 
> > 
> > I have the impression, but I may well be wrong, that contrarily to a
> > format, a *formatted number* instead seems easy to scan -- with human
> > eyes. So, as a crazy idea, I wonder whether we shouldn't let the user
> > provide a example formatted number instead. This may address most of use
> > cases, but probably not all.
> > 
> > To makes things easier, why not specify a canonical number, such as
> > '-123456.789', of which the user should define the formatted version?
> > Then a smart parser could deduce the format to be applied to further
> > numbers. Below a purely artificial example.
> > 
> > -123456.789   -->   kg 00_123_456,79-
> > 
> > format:
> >    unit: 'kg'
> >    unit_pos: LEFT
> >    unit_sep: ' '
> >    thousand_sep: '_'
> >    fract_sep : ','
> >    sign_pos: RIGHT
> >    sign_sep: None
> >    padding_char: '0'
> > 
> > There are obvious issues:
> > * Does rouding apply to whole precision (number of significative digits),
> > or to the fractional part only? Then, should the format be interpreted as
> > the most common case (probably fract. rounding), provide a disambiguation
> > flag, provide a flag for non-default case only? What if rounding applies
> > after a big number of digits? Should we instead allow the user providing
> > a longer number?
> > * Similar for padding: does it apply to the length of the whole number or
> > to the integral part (common in financial apps to align decimal signs).
> > What if the  padding applies to a smaller number of digits than the one
> > of the canonical number. Should we instead allow the user providing a
> > shorter number?
> > * probably more...
> > 
> > The space of valid formats can be specified using a parsing grammar, so
> > that a parse failure indicates invalid format, and a "tagged" parse tree
> > provides all the information needed to construct a format object.
> > 
> > Really do not know whether this idea is stupid or worth beeing
> > explored ;-) [But I would well try it for personal use. At least as
> > everyday-fast-and-easy feature.]
> 
> Your proposal (other than being harder to implement), is similar to the 
> way Excel handled formatting, but instead of sample number, they uses # 
> for placeholder. If you really want to test-implement it, better try 
> using that.

Right. I also think now that "picture strings" pointed in the PEP are a better option for such needs. While they probably cannot handle issues such as ambiguity of precision or padding without additional parameters, neither. The only advantage of my proposal is that the user provides an example, instead of an abstract representation.

> And I think it is impossible for the parser to be that smart to 
> recognize that sign pos should be put in the rear (the smartest parser 
> might only treat it as literal negative).

? Either I do not understand, or it is wrong. You can well have a parse expression allowing either a front or a rear sign, as long as there is a non-ambiguous sign-pattern.
What does 'literal negative' mean?

> Also it is highly inflexible, 
> what about custom positive sign? What if I want to use literal -? What 
> about literal number? What about non-latin number?

~ true. But this applies to any formatting rule, no? You have to specify eg which code point areas are allowed for valid digits -- and that must not overlap with code points allowed as sign, separators, or whatever.
Custom signs are not a problem, as long as they do not conflict with digits or seps. Idem for non-latin. These points are not specific to my proposal, they apply to any kind of formatting instead.

> What if I want to use literal -? What about literal number?

I do not understand your point.

Denis
------
la vita e estrany



More information about the Python-ideas mailing list