[Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

Lie Ryan lie.1296 at gmail.com
Fri Mar 13 16:09:28 CET 2009


spir wrote:
> Le Fri, 13 Mar 2009 22:46:32 +1100,
> Lie Ryan <lie.1296 at gmail.com> s'exprima ainsi:
> 
>> spir wrote:
>>> Le Thu, 12 Mar 2009 19:31:39 -0400,
>>> Eric Smith <eric at trueblade.com> s'exprima ainsi:
>>>
>>>>> I've always thought that we should have a utility function which
>>>>> formats a number based on the same settings that are in the locale, but
>>>>> not actually use the locale. Something like:
>>>>>
>>>>> format_number(123456787654321.123, decimal_point=',', thousands_sep=' ',
>>>>>               grouping=[4, 3, 2])  
>>>>>  >>> '12 34 56 78 765 4321,123'  
>>>> To be maximally useful (for example, so it could be used in Decimal to 
>>>> implement locale formatting), maybe it should work on strings:
>>>>
>>>>  >>> format_number(whole_part='123456787654321',  
>>>>                fractional_part='123',
>>>>                decimal_point=',',
>>>>                thousands_sep=' ',
>>>>                grouping=[4, 3, 2])
>>>>  >>> '12 34 56 78 765 4321,123'  
>>>>
>>>>  >>> format_number(whole_part='123456787654321',  
>>>>                decimal_point=',',
>>>>                thousands_sep='.',
>>>>                grouping=[4, 3, 2])
>>>>  >>> '12.34.56.78.765.4321'  
>>>  
>>>
>>> I find the overall problem of providing an interface to specify a number
>>> format rather challenging. The issue I see is to design a formatting
>>> pattern that is simple, clear, _and_ practicle. A practicle pattern is
>>> easy to specify, but then it becomes rather illegible and/or hard to
>>> remember, while a legible one ends up excessively verbose. 
>>>
>>> I have the impression, but I may well be wrong, that contrarily to a
>>> format, a *formatted number* instead seems easy to scan -- with human
>>> eyes. So, as a crazy idea, I wonder whether we shouldn't let the user
>>> provide a example formatted number instead. This may address most of use
>>> cases, but probably not all.
>>>
>>> To makes things easier, why not specify a canonical number, such as
>>> '-123456.789', of which the user should define the formatted version?
>>> Then a smart parser could deduce the format to be applied to further
>>> numbers. Below a purely artificial example.
>>>
>>> -123456.789   -->   kg 00_123_456,79-
>>>
>>> format:
>>>    unit: 'kg'
>>>    unit_pos: LEFT
>>>    unit_sep: ' '
>>>    thousand_sep: '_'
>>>    fract_sep : ','
>>>    sign_pos: RIGHT
>>>    sign_sep: None
>>>    padding_char: '0'
>>>
>>> There are obvious issues:
>>> * Does rouding apply to whole precision (number of significative digits),
>>> or to the fractional part only? Then, should the format be interpreted as
>>> the most common case (probably fract. rounding), provide a disambiguation
>>> flag, provide a flag for non-default case only? What if rounding applies
>>> after a big number of digits? Should we instead allow the user providing
>>> a longer number?
>>> * Similar for padding: does it apply to the length of the whole number or
>>> to the integral part (common in financial apps to align decimal signs).
>>> What if the  padding applies to a smaller number of digits than the one
>>> of the canonical number. Should we instead allow the user providing a
>>> shorter number?
>>> * probably more...
>>>
>>> The space of valid formats can be specified using a parsing grammar, so
>>> that a parse failure indicates invalid format, and a "tagged" parse tree
>>> provides all the information needed to construct a format object.
>>>
>>> Really do not know whether this idea is stupid or worth beeing
>>> explored ;-) [But I would well try it for personal use. At least as
>>> everyday-fast-and-easy feature.]
>> Your proposal (other than being harder to implement), is similar to the 
>> way Excel handled formatting, but instead of sample number, they uses # 
>> for placeholder. If you really want to test-implement it, better try 
>> using that.
> 
> Right. I also think now that "picture strings" pointed in the PEP are a better option for such needs. While they probably cannot handle issues such as ambiguity of precision or padding without additional parameters, neither. The only advantage of my proposal is that the user provides an example, instead of an abstract representation.
> 
>> And I think it is impossible for the parser to be that smart to 
>> recognize that sign pos should be put in the rear (the smartest parser 
>> might only treat it as literal negative).
> 
> ? Either I do not understand, or it is wrong. 

Partially wrong, when I said "literal negative" I really meant "literal -".

> You can well have a parse expression allowing either a front or a rear sign, as long as there is a non-ambiguous sign-pattern.
> What does 'literal negative' mean?

But what if I want ~ to denote negative number?

>> Also it is highly inflexible, 
>> what about custom positive sign? What if I want to use literal -? What 
>> about literal number? What about non-latin number?
> 
> ~ true. But this applies to any formatting rule, no? 

Yes, but using number example introduces lots of ambiguities. You must 
use parameters to avoid these ambiguities.

> You have to specify eg which code point areas are allowed for valid digits -- and that must not overlap with code points allowed as sign, separators, or whatever.

> Custom signs are not a problem, as long as they do not conflict with digits or seps. Idem for non-latin. These points are not specific to my proposal, they apply to any kind of formatting instead.

How would the example format interpret this:
123 456~

When I want ~ to be the negative sign?

What if I want < for negative and > for positive?

Those are quite hyphotetical, but if we're talking about languages that 
doesn't use latin numeral, that sort of thing is very likely to happen.

>> What if I want to use literal -? What about literal number?
> 
> I do not understand your point.

What if I want to I want my number to look like this:
123-4567

Using example format would have a hard time to guess whether the "-" 
should be a negative sign or literal "-". Maybe you can use escape 
characters, but that would turn the strongest point of example format to 
  itself




More information about the Python-ideas mailing list