Rough draft: Proposed format specifier for a thousands separator

Raymond Hettinger python at rcn.com
Thu Mar 12 04:41:59 EDT 2009


> If anyone here is interested, here is a proposal I posted on the
> python-ideas list.
>
> The idea is to make numbering formatting a little easier with
> the new format() builtin:
> http://docs.python.org/library/string.html#formatspec

Here's a re-post (hopefully without the line wrapping problems
in the previous post).

Raymond

-------------------------------------------------------------



Motivation:
-----------

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.

In the finance world, output with commas is the norm.  Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention.  The goal is to make a common task easier
for many users.


Research so far:
----------------

Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
COMMA is used when a PERIOD is the decimal separator.

James Knight observed that Indian/Pakistani numbering systems
group by hundreds.   Ben Finney noted that Chinese group by
ten-thousands.

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: "_($* #,##0_)".



Proposal I (from Nick Coghlan):
-------------------------------

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:

  format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:

  format(n, "6,f").replace(",", "X").replace(".", ",").replace("X",
".")


Proposal II (to meet Antoine Pitrou's request):
-----------------------------------------------

Make both the thousands separator and decimal separator user
specifiable but not locale aware.  For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]

Examples:

  format(1234, "8.1f")    -->     '  1234.0'
  format(1234, "8,1f")    -->     '  1234,0'
  format(1234, "8T.,1f")  -->     ' 1.234,0'
  format(1234, "8T .f")   -->     ' 1 234,0'
  format(1234, "8d")      -->     '    1234'
  format(1234, "8T,d")    -->     '   1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
expense of being a little more complicated to learn and
remember.  Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

For the locale module, just the "T" is necessary in a
formatting string since the tool already has procedures for
figuring out the actual separators from the local context.




More information about the Python-list mailing list