[Python-ideas] Customizing format()

Raymond Hettinger python at rcn.com
Tue Mar 17 21:40:51 CET 2009


I've been exploring how to customize our thousands separators and decimal
separators and wanted to offer-up an idea.  It arose when I was looking at 
Java's DecimalFormat class and its customization tool DecimalFormatSymbols
http://java.sun.com/javase/6/docs/api/java/text/DecimalFormat.html .
Also, I looked at how regular expression patterns provide options to change
the meaning of its special characters using (?iLmsux).

I.  Simplest version -- Translation pairs

    format(1234, "8,.1f")         -->   ' 1,234.0'
    format(1234, "(,_)8,.1f")     -->   ' 1_234.0'
    format(1234, "(,_)(.,)8,.1f") -->   ' 1_234,0'

This approach is very easy to implement and it doesn't make life difficult
for the parser which can continue to look for just a comma and period
with their standardized meaning.  It also fits nicely in our current framework
and doesn't require any changes to the format() builtin.  Of all the options,
I find this one to be the easiest to read.

Also, this version makes it easy to employ a couple of techniques to factor-out
formatting decisions.  Here's a gettext() style approach.

    def _(s):
         return '(,.)(.,)' + s
    . . .
    format(x, _('8.1f'))

Here's another approach using implicit string concatenation:

     DEB = '(,_)'        # style for debugging
     EXT = '(, )'         # style for external display
     . . .
     format(x, DEB '8.1f')
     format(y, EXT '8d')

There are probably many ways to factor-out the decision.  We don't need to
decide which is best, we just need to make it possible.

One other thought, this approach makes it possible to customize all of the
characters that are currently hardwired (including zero and space padding
characters and the 'E' or 'e' exponent symbols).


II.  Javaesque version -- FormatSymbols object

This is essentially the same idea as previous one but involves modifying 
the format() builtin to accept a symbols object and pass it to __format__ 
methods. This moves the work outside of the format string itself:

      DEB = FormatSymbols(comma='_')
      EXT = FormatSymbols(comma=' ')
      . . .
      format(x, '8.1f', DEB)
      format(y, '8d', EXT)

The advantage is that this technique is easily extendable beyond simple
symbol translations and could possibly allow specification of grouping
sizes in hundreds and whatnot.  It also looks more like a real program
as opposed to a formatting mini-language.  The disadvantage is that
it is likely slower and it requires mucking with the currently dirt simple
format() / __format__() protocol.  It may also be harder to integrate
with existing __format__ methods which are currently very string oriented.


Raymond

   



More information about the Python-ideas mailing list