[Python-ideas] Format mini-language for lakh and crore

David Mertz mertz at gnosis.cx
Sun Jan 28 01:25:09 EST 2018


In South Asia, a different style of digit delimiters for large numbers is
used than in Europe, North America, Australia, etc.  With some minor
spelling differences, the term lakh is used for a hundred-thousand, and it
is generally written as '1,00,000'.

In turn, a crore is 100 lakh, and is written as '1,00,00,000'.  Extending
this pattern, larger numbers continue to use two digits in groups (other
than the smallest grouping of three digits.  So, e.g. 1e12 is written
as 10,00,00,00,00,000.

It's nice that we now have the optional underscore in numeric literals.  So
we could write a number as either `12_34_56_78_00_000` or
`1_234_567_800_000` depending on what region of the world and which
convention was more familiar.

However, in *formatting* those numbers, the format mini-language only
allows the European convention.  So e.g.

In [1]: x = 12_34_56_78_00_000
In [2]: "{:,d}".format(x)
Out[2]: '1,234,567,800,000'
In [3]: f"{x:,d}"
Out[3]: '1,234,567,800,000'


In order to get Indian number delimiters, you'd have to write a custom
formatting function, notwithstanding that something like 1.5 billion people
use the three-then-two delimiting convention.

I propose that Python should have an additional grouping option, or some
other way to specify this grouping convention.  Oddly, the '_' grouping
symbol is available, even though no one actually uses that grouper outside
of programming languages like Python, e.g.:

In [4]: f"{x:_d}"
Out[4]: '1_234_567_800_000'


I guess this is nice for something like round-tripping numbers used in
code, but it's not a symbol anyone uses "natively" (I understand why comma
or period cannot be used in numeric literals since they mean something else
in Python already).

I'm not sure what symbol or combination I would recommend, but finding
something suitable shouldn't be so hard.  Perhaps now that backtick no
longer has any other meaning in Python, it could be used since it looks
similar to a comma.  E.g. in Python 3.8 we might have:

>>> f"{x:`d}"
'12,34,56,78,00,000'

(actually, this probably isn't any parser issue even in Python 2 since it's
already inside quotes; but the issue is moot).

Or maybe a two character version like:

>>> f"{x:2,d}"
'12,34,56,78,00,000'


Or:

>>> f"{x:,,d}"
'12,34,56,78,00,000'


Even if `2,` was used, that wouldn't preclude giving an additional length
descriptor after it.  Now we can have:

>>> f"{x:,.2f}"

'1,234,567,800,000.00'

Perhaps in the future this would work:

>>> f"{x:2,.2f}"
'12,34,56,78,00,000.00'


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180127/23ab72d4/attachment.html>


More information about the Python-ideas mailing list