Rough draft: Proposed format specifier for a thousands separator

Lie Ryan lie.1296 at gmail.com
Fri Mar 13 14:16:38 EDT 2009


Raymond Hettinger wrote:
> [andrew cooke]
>> would it break anything to also allow
>>
>>>>> format(1234567, 'd')       # what we have now
>>  '1234567'
>>>>> format(1234567, '.d')      # proposed new option
>>  '1.234.567'
>>>>> format(1234.5, ',2f')      # proposed new option
>>  '1234,50'
>>>>> format(1234.5, '.,2f')     # proposed new option
> 
> Yes, that's allowed too!  The separators can be any one of COMMA,
> SPACE, DOT, UNDERSCORE, or NON-BREAKING-SPACE.

What if I want other separators?

How about this idea: make the format has "long" format, which is a bit 
more verbose, flexible, and unambiguous, and the current proposal a 
"short" format, which is more concise.

The "long" format would be like this (this is much, much more featureful 
than the current proposition, I think I might have crossed far beyond 
the Mayan line):

[n|sign <signnegative>[[, <signzero>], <signpositive>] | ]
[w|min <minwidth>[, <align>[, <alignfill>]]]
[x|max <maxwidth>[, <overflowsign[, overflowalign]>]]
[s|sep [[...]<sep><sepwidth>]<sep><sepwidth> | ]
[dp|decpoint <decpoint> | ]
[ds|decsep <width><sep>[, <width><sep>[...]] | ]
[b|base <base-n>[, <charset>]]
[p|prec <prec> | ]
t|type <type>

The feel of "long" format
fmt_string: 'type f'

   number: 876543213456.98765445
   result: 876543213456.98765445

fmt_string: 'decpoint ^ | type f'

   number: 876543213456.98765445
   result: 876543213456^98765445

fmt_string: 'sep 2>1:3.4 | decpoint , | prec 3 | type f'

   number: 876543213456.98765445
   result: 87>65>4:321.3456,988

fmt_string: 'sep 2>1:3.4 | decpoint , | prec 3 | type f'

   number: 876543213456.98765445
   result: 87>65>4:321.3456,988

fmt_string: 'sep 2>1:3.4 | decpoint , | prec 3 | type f'

   number: 876543213456.98765445
   result: 87>65>4:321.3456,988

General Rules:
- every field, except type is optional
- fields are separated by | (this may change), escape literal | with ||
- every fields starts with an identifier then a mandatory whitespace
- subfields are separated by commas. Each identifier has long and short 
identifier.
- Processing precedent is: type, base, prec, sep/decsep, decpoint, sign, 
min, max

Specific rules:
- min and max determines width, min determine the rule when the 
resulting string is shorter than minwidth, max determine rule when the 
resulting string is longer than maxwidth (basically trimming). alignfill 
is character/sequence of character to be used to make the resulting 
string as long as minwidth, overflowsign is character added when 
maxwidth is exceeded and trimming occurs
- sep is basically a separator delimited for each width. The regular 
latin number system would be represented as sep 3.3 the leftmost number 
and separator would be repeated.
- decsep works similarly to sep
- base is the number base, charset is mapping of digits used to 
represent output number in the certain base.

PS: It is not designed for hand written, but is meant to be fairly readable
PPS: It is fairly modular too



More information about the Python-list mailing list