[Python-ideas] Customizing format()
tjreedy at udel.edu
Tue Mar 17 22:43:10 CET 2009
Raymond Hettinger wrote:
> I've been exploring how to customize our thousands separators and decimal
> separators and wanted to offer-up an idea. It arose when I was looking
> at Java's DecimalFormat class and its customization tool
> http://java.sun.com/javase/6/docs/api/java/text/DecimalFormat.html .
> Also, I looked at how regular expression patterns provide options to change
> the meaning of its special characters using (?iLmsux).
> I. Simplest version -- Translation pairs
> format(1234, "8,.1f") --> ' 1,234.0'
> format(1234, "(,_)8,.1f") --> ' 1_234.0'
> format(1234, "(,_)(.,)8,.1f") --> ' 1_234,0'
> This approach is very easy to implement and it doesn't make life difficult
> for the parser which can continue to look for just a comma and period
> with their standardized meaning. It also fits nicely in our current
> and doesn't require any changes to the format() builtin. Of all the
> I find this one to be the easiest to read.
I strongly prefer suffix to prefix modification. The format gives the
overall structure of the output, the rest are details, which a reader
may not care so much about.
> Also, this version makes it easy to employ a couple of techniques to
These techniques apply to any "augment the basic format with an affix"
> formatting decisions. Here's a gettext() style approach.
> def _(s):
> return '(,.)(.,)' + s
> . . .
> format(x, _('8.1f'))
> Here's another approach using implicit string concatenation:
> DEB = '(,_)' # style for debugging
> EXT = '(, )' # style for external display
> . . .
> format(x, DEB '8.1f')
> format(y, EXT '8d')
> There are probably many ways to factor-out the decision. We don't need to
> decide which is best, we just need to make it possible.
> One other thought, this approach makes it possible to customize all of the
> characters that are currently hardwired (including zero and space padding
> characters and the 'E' or 'e' exponent symbols).
Any "augment the format with affixes" method should do the same.
I prefer at most a separator (;) between affixes rather than fences
I also prefer, mnemonic key letters to mark the start of each affix,
such as in Guido's quick suggestion: Thousands, Decimal_point, Exponent,
Grouping, Pad_char, Money, and so on. But I do not think '=' is needed.
Since the replacement will almost always be a single non-captital
letter char, I am not sure a separator is even needed, but it would make
parsing much easier. G would be followed by one or more digits
indicating grouping from Decimal_point leftward, with the last repeated.
If grouping by 9s is not large enough, allow a-f to get grouping up to
15 ;-). Example above would be
> II. Javaesque version -- FormatSymbols object
> This is essentially the same idea as previous one but involves modifying
> the format() builtin to accept a symbols object and pass it to
> __format__ methods. This moves the work outside of the format string
> DEB = FormatSymbols(comma='_')
> EXT = FormatSymbols(comma=' ')
> . . .
> format(x, '8.1f', DEB)
> format(y, '8d', EXT)
> The advantage is that this technique is easily extendable beyond simple
> symbol translations and could possibly allow specification of grouping
> sizes in hundreds and whatnot. It also looks more like a real program
> as opposed to a formatting mini-language. The disadvantage is that
> it is likely slower and it requires mucking with the currently dirt simple
> format() / __format__() protocol. It may also be harder to integrate
> with existing __format__ methods which are currently very string oriented.
I suggested in the thread in exposing the format parse result that the
resulting structure (dict or named tuple) could become an alternative,
wordy interface to the format functions. I think the mini-language
itself should stay mini.
Terry Jan Reedy
More information about the Python-ideas