Rough draft: Proposed format specifier for a thousands separator

pruebauno at latinmail.com pruebauno at latinmail.com
Thu Mar 12 10:51:16 EDT 2009


On Mar 12, 3:30 am, Raymond Hettinger <pyt... at rcn.com> wrote:
> If anyone here is interested, here is a proposal I posted on the
> python-ideas list.
>
> The idea is to make numbering formatting a little easier with the new
> format() builtin
> in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec
>
> -------------------------------------------------------------
>
> Motivation:
>
>     Provide a simple, non-locale aware way to format a number
>     with a thousands separator.
>
>     Adding thousands separators is one of the simplest ways to
>     improve the professional appearance and readability of
>     output exposed to end users.
>
>     In the finance world, output with commas is the norm.  Finance
> users
>     and non-professional programmers find the locale approach to be
>     frustrating, arcane and non-obvious.
>
>     It is not the goal to replace locale or to accommodate every
>     possible convention.  The goal is to make a common task easier
>     for many users.
>
> Research so far:
>
>     Scanning the web, I've found that thousands separators are
>     usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
>     COMMA is used when a PERIOD is the decimal separator.
>
>     James Knight observed that Indian/Pakistani numbering systems
>     group by hundreds.   Ben Finney noted that Chinese group by
>     ten-thousands.
>
>     Visual Basic and its brethren (like MS Excel) use a completely
>     different style and have ultra-flexible custom format specifiers
>     like: "_($* #,##0_)".
>
> Proposal I (from Nick Coghlan]:
>
>     A comma will be added to the format() specifier mini-language:
>
>     [[fill]align][sign][#][0][minimumwidth][,][.precision][type]
>
>     The ',' option indicates that commas should be included in the
> output as a
>     thousands separator. As with locales which do not use a period as
> the
>     decimal point, locales which use a different convention for digit
>     separation will need to use the locale module to obtain
> appropriate
>     formatting.
>
>     The proposal works well with floats, ints, and decimals.  It also
>     allows easy substitution for other separators.  For example:
>
>         format(n, "6,f").replace(",", "_")
>
>     This technique is completely general but it is awkward in the one
>     case where the commas and periods need to be swapped.
>
>         format(n, "6,f").replace(",", "X").replace(".", ",").replace
> ("X", ".")
>
> Proposal II (to meet Antoine Pitrou's request):
>
>     Make both the thousands separator and decimal separator user
> specifiable
>     but not locale aware.  For simplicity, limit the choices to a
> comma, period,
>     space, or underscore..
>
>     [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
> [type]
>
>     Examples:
>
>         format(1234, "8.1f")    -->     '  1234.0'
>         format(1234, "8,1f")    -->     '  1234,0'
>         format(1234, "8T.,1f")  -->     ' 1.234,0'
>         format(1234, "8T .f")   -->     ' 1 234,0'
>         format(1234, "8d")      -->     '    1234'
>         format(1234, "8T,d")      -->   '   1,234'
>
>     This proposal meets mosts needs (except for people wanting
> grouping
>     for hundreds or ten-thousands), but it comes at the expense of
>     being a little more complicated to learn and remember.  Also, it
> makes it
>     more challenging to write custom __format__ methods that follow
> the
>     format specification mini-language.
>
>     For the locale module, just the "T" is necessary in a formatting
> string
>     since the tool already has procedures for figuring out the actual
>     separators from the local context.
>
> Comments and suggestions are welcome but I draw the line at supporting
> Mayan numbering conventions ;-)
>
> Raymond

As far as I am concerned the most simple version plus a way to swap
around commas and period is all that is needed. The rest can be done
using one replace (because the decimal separator is always one of two
options). This should cover everywhere but the far east. 80% of cases
for 20% of implementation complexity.

For example:

[[fill]align][sign][#][0][,|.][minimumwidth][.precision][type]

>         format(1234, ".8.1f")  -->     ' 1.234,0'
>         format(1234, ",8.1f")  -->     ' 1,234.0'




More information about the Python-list mailing list