I've been exploring how to customize our thousands separators and decimal separators and wanted to offer-up an idea. It arose when I was looking at Java's DecimalFormat class and its customization tool DecimalFormatSymbols http://java.sun.com/javase/6/docs/api/java/text/DecimalFormat.html . Also, I looked at how regular expression patterns provide options to change the meaning of its special characters using (?iLmsux). I. Simplest version -- Translation pairs format(1234, "8,.1f") --> ' 1,234.0' format(1234, "(,_)8,.1f") --> ' 1_234.0' format(1234, "(,_)(.,)8,.1f") --> ' 1_234,0' This approach is very easy to implement and it doesn't make life difficult for the parser which can continue to look for just a comma and period with their standardized meaning. It also fits nicely in our current framework and doesn't require any changes to the format() builtin. Of all the options, I find this one to be the easiest to read. Also, this version makes it easy to employ a couple of techniques to factor-out formatting decisions. Here's a gettext() style approach. def _(s): return '(,.)(.,)' + s . . . format(x, _('8.1f')) Here's another approach using implicit string concatenation: DEB = '(,_)' # style for debugging EXT = '(, )' # style for external display . . . format(x, DEB '8.1f') format(y, EXT '8d') There are probably many ways to factor-out the decision. We don't need to decide which is best, we just need to make it possible. One other thought, this approach makes it possible to customize all of the characters that are currently hardwired (including zero and space padding characters and the 'E' or 'e' exponent symbols). II. Javaesque version -- FormatSymbols object This is essentially the same idea as previous one but involves modifying the format() builtin to accept a symbols object and pass it to __format__ methods. This moves the work outside of the format string itself: DEB = FormatSymbols(comma='_') EXT = FormatSymbols(comma=' ') . . . format(x, '8.1f', DEB) format(y, '8d', EXT) The advantage is that this technique is easily extendable beyond simple symbol translations and could possibly allow specification of grouping sizes in hundreds and whatnot. It also looks more like a real program as opposed to a formatting mini-language. The disadvantage is that it is likely slower and it requires mucking with the currently dirt simple format() / __format__() protocol. It may also be harder to integrate with existing __format__ methods which are currently very string oriented. Raymond
participants (7)
-
Antoine Pitrou
-
Carl Johnson
-
Eric Smith
-
Nick Coghlan
-
Raymond Hettinger
-
Steven D'Aprano
-
Terry Reedy