[Python-ideas] Customizing format()

Tue Mar 17 22:43:10 CET 2009

Raymond Hettinger wrote:
> I've been exploring how to customize our thousands separators and decimal
> separators and wanted to offer-up an idea.  It arose when I was looking 
> at Java's DecimalFormat class and its customization tool 
> DecimalFormatSymbols
> http://java.sun.com/javase/6/docs/api/java/text/DecimalFormat.html .
> Also, I looked at how regular expression patterns provide options to change
> the meaning of its special characters using (?iLmsux).
> 
> I.  Simplest version -- Translation pairs
> 
>    format(1234, "8,.1f")         -->   ' 1,234.0'
>    format(1234, "(,_)8,.1f")     -->   ' 1_234.0'
>    format(1234, "(,_)(.,)8,.1f") -->   ' 1_234,0'
> 
> This approach is very easy to implement and it doesn't make life difficult
> for the parser which can continue to look for just a comma and period
> with their standardized meaning.  It also fits nicely in our current 
> framework
> and doesn't require any changes to the format() builtin.  Of all the 
> options,
> I find this one to be the easiest to read.

I strongly prefer suffix to prefix modification.  The format gives the 
overall structure of the output, the rest are details, which a reader 
may not care so much about.

> Also, this version makes it easy to employ a couple of techniques to 
> factor-out

These techniques apply to any "augment the basic format with an affix" 
method.

> formatting decisions.  Here's a gettext() style approach.
> 
>    def _(s):
>         return '(,.)(.,)' + s
>    . . .
>    format(x, _('8.1f'))
> 
> Here's another approach using implicit string concatenation:
> 
>     DEB = '(,_)'        # style for debugging
>     EXT = '(, )'         # style for external display
>     . . .
>     format(x, DEB '8.1f')
>     format(y, EXT '8d')
> 
> There are probably many ways to factor-out the decision.  We don't need to
> decide which is best, we just need to make it possible.
> 
> One other thought, this approach makes it possible to customize all of the
> characters that are currently hardwired (including zero and space padding
> characters and the 'E' or 'e' exponent symbols).

Any "augment the format with affixes" method should do the same.
I prefer at most a separator (;) between affixes rather than fences 
around them.

I also prefer, mnemonic key letters to mark the start of each affix, 
such as in Guido's quick suggestion: Thousands, Decimal_point, Exponent, 
Grouping, Pad_char, Money, and so on.  But I do not think '=' is needed. 
  Since the replacement will almost always be a single non-captital 
letter char, I am not sure a separator is even needed, but it would make 
parsing much easier. G would be followed by one or more digits 
indicating grouping from Decimal_point leftward, with the last repeated. 
  If grouping by 9s is not large enough, allow a-f to get grouping up to 
15 ;-).  Example above would be

format(1234, '8.1f;T.;P,')

> II.  Javaesque version -- FormatSymbols object
> 
> This is essentially the same idea as previous one but involves modifying 
> the format() builtin to accept a symbols object and pass it to 
> __format__ methods. This moves the work outside of the format string 
> itself:
> 
>      DEB = FormatSymbols(comma='_')
>      EXT = FormatSymbols(comma=' ')
>      . . .
>      format(x, '8.1f', DEB)
>      format(y, '8d', EXT)
> 
> The advantage is that this technique is easily extendable beyond simple
> symbol translations and could possibly allow specification of grouping
> sizes in hundreds and whatnot.  It also looks more like a real program
> as opposed to a formatting mini-language.  The disadvantage is that
> it is likely slower and it requires mucking with the currently dirt simple
> format() / __format__() protocol.  It may also be harder to integrate
> with existing __format__ methods which are currently very string oriented.

I suggested in the thread in exposing the format parse result that the 
resulting structure (dict or named tuple) could become an alternative, 
wordy interface to the format functions.  I think the mini-language 
itself should stay mini.

Terry Jan Reedy