Improve readability of long numeric literals

Hi everyone! Sometimes it's hard to read long numbers. For example:
opts.write_buffer_size = 67108864
Some languages (Ruby, Perl, Swift) allow the use of underscores in numeric literals, which are ignored. They are typically used as thousands separators. The example above would look like this:
opts.write_buffer_size = 67_108_864
Which helps to quickly identify that this is around 67 million. Another option is to use spaces instead of underscores:
opts.write_buffer_size = 67 108 864
This has two advantages: 1. is analog to the way string literals work, which are concatenated if put next to each other. 2. spaces are already used as thousands separator in many european languages [1]. The disadvantage is that, as far as I known, no other languages do this. I have seen some old discussions around this, but nothing on this list or a PEP. With Python being use more and more for scientific and numeric computation, this is a small change that will help with readability a lot. And, as far as I can tell, it doesn't break compatibility in any way. Thoughts? Manuel. [1] https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html

Ian Kelly <ian.g.kelly@gmail.com> writes:
The exact same fact – that a proposed new syntax was previously a syntax error – is commonly presented as a positive. We know there are no valid Python programs already using the construct to mean something else. So I don't think it's reasonable to present that now as though it were negative. (good sigmonster, have a cookie) -- \ “… correct code is great, code that crashes could use | `\ improvement, but incorrect code that doesn’t crash is a | _o__) horrible nightmare.” —Chris Smith, 2008-08-22 | Ben Finney

On Tue, Feb 9, 2016 at 4:06 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
I think you misunderstand. The argument (which I agree with) is that the syntax error was considered beneficial, since it would catch common typos -- so that we're loath to make that valid code. -- --Guido van Rossum (python.org/~guido)

On 02/09/2016 04:06 PM, Ben Finney wrote:
Ian Kelly writes:
If the SyntaxError is also a common mistake, then suddenly having it be working, but wrong, code is a bad thing.
(good sigmonster, have a cookie)
“… correct code is great, code that crashes could use | improvement, but incorrect code that doesn’t crash is a | horrible nightmare.” —Chris Smith, 2008-08-22 | Yes, good sigmonster - "incorrect code that doesn't crash is a horrible nightmare" -- which is what could happen when a SyntaxError suddenly becomes syntacticly correct. -- ~Ethan~

On 02/09/2016 01:40 PM, Manuel Cerón wrote:
As I recall, a number of years ago we had this discussion and Guido approved the idea. The only email I could locate at the moment, though, shows his support of the idea, but not outright approval. [1] I dare say if somebody submitted a patch it would fare well. (As in: be accepted, not gone forever.) -- ~Ethan~ [1] https://mail.python.org/pipermail/python-ideas/2011-May/010157.html

On 9 Feb 2016 23:18, "Guido van Rossum" <guido@python.org> wrote:
Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure about "12_34_56" but there are probably use cases for that too.)
It would be useful for hex literals. There are other more confusing possibilities such as 1_._0_e_-_1_0. -- Oscar

On Feb 9, 2016, at 16:45, MRAB <python@mrabarnett.plus.com> wrote:
The Ada programming language allows underscores in numerals, but requires there to be a digit on both sides of the underscore.
I think Swift, Ruby, and most other languages allow runs of multiple underscores, and even trailing underscores. It seems like it's a lot easier to lex "digit (digit-or-underscore)*", "0x (hex-digit-or-underscore)+", etc. than to try to add restrictions. And not just for Python itself, but for anyone who wants to write a Python tokenizer or parser. And it's a shorter rule to document, and easier to remember. So, unless there's a really compelling reason for an extra restriction, I think it's better to leave the restrictions out (and make them style issues).

On 10 February 2016 at 00:45, MRAB <python@mrabarnett.plus.com> wrote:
The Ada programming language allows underscores in numerals, but requires there to be a digit on both sides of the underscore.
+1 to this. Nobody's given an important use-case for any of the odd cases (doubled or trailing underscores, or those oddly placed in floats) but there are legitimate reasons to want groups of different sizes, especially with non-decimal bases. Saying a digit is needed either side is both obvious and sufficient.

On 10 February 2016 at 03:32, Joshua Landau <joshua@landau.ws> wrote:
It's possible to get it right, but I think keeping the grammar simple and making the rest a style issue is the best approach. We don't disallow 0x6AfEbbC for example, but mixing case like that is ugly to read too. (I was originally going to say "Under that change, "23" becomes invalid" but then I realised I'd misread the grammar. Which sort of makes my point that we want to keep the rules simple :-)) Paul

On 9 February 2016 at 23:51, Guido van Rossum <guido@python.org> wrote:
I don't know what a bank sort code is (maybe a UK thing?)
It is a UK thing. It identifies the bank you opened your account with.
FWIW there are some edge cases to be decided: is _123 valid? or 123_? or 123__456?
_123 is currently a valid identifier:
123_ is not. There's no good reason to allow either though. If the purpose is to separate the digits for clarity then the underscore doesn't need to be at the beginning or the end. -- Oscar

On Wed, Feb 10, 2016 at 12:16:43AM +0000, Oscar Benjamin wrote:
Agreed. Disallow leading and trailing underscores, otherwise allow and ignore any number of underscores in integer literals so that all of these are legal: 123_456_789 0x1234_ABCD 0b1111_0000_1010_0101 0o12_34 For avoidance of doubt, there must be at least one digit before the first underscore. These are not allowed: -_123_456 +_123_456 (actually, they are allowed, since they're legal identifiers). Consecutive underscores will be allowed: 1234____5678 but the docs (PEP 8?) should say "don't do that". Likewise for excessive underscores: 1_2_3_4_5_6_7_8_9_0 These sorts of abuses are a style issue, not a syntax issue. Floats are more complex: 123_456.000_001e-23 looks okay to me, but what about this? 123_456_._000_001_e_-_23 I think that's ugly. Should we restrict underscores to being only between digits? Or just call that a style issue too? -- Steve

On Wed, 10 Feb 2016 11:50:41 +1100, Steven D'Aprano wrote:
Floats are more complex:
123_456.000_001e-23
Floats are less complex. Complexes are more complex: 22j 123_456.000_001e-23j :-)
looks okay to me, but what about this?
123_456_._000_001_e_-_23
Agreed: that is no longer more readable. All I can think of for a use case for leading underscores would be to line up values of different lengths: x4 = __4 x5 = _33 x6 = __4 x7 = 220 milli = 1e__-3 micro = 1e__-6 nano = 1e__-9 pico = 1e_-12 femto = 1e_-15 But PEP 8 already suggests otherwise: milli = 1e-3 micro = 1e-6 nano = 1e-9 pico = 1e-12 femto = 1e-15

On Wed, Feb 10, 2016 at 12:51 AM, Guido van Rossum <guido@python.org> wrote:
_123 is a valid identifier name, so no. For consistency, I think the leading underscore should be out too. Multiple underscores in the middle might be useful for separating millions and thousands: 700__000_000 but perhaps it's too much.

On 10 February 2016 at 01:07, Ethan Furman <ethan@stoneleaf.us> wrote:
and in talking about int() accepting underscored inputs:
It seems entirely harmless here. Also for float().
I don't agree with either of those. Syntax accepted by int() is less permissive than for int literals (e.g. int('0x1')) which is good because int is often used to process data form external sources. In this vain I'm not sure how I feel about int accepting non-ascii characters - perhaps there should be a separate int.from_ascii function for this purpose but that's a different subject. Having float() accept underscored inputs violates IEEE754. That doesn't mean it's impossible but why bother? -- Oscar

On Wed, Feb 10, 2016 at 12:35 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
+1. Keep int() as it is, so we don't get weird stuff happening and causing confusion. But I'm +1 on allowing underscores between digits (not straight after a decimal point in a float, though, as it looks like attribute access). ChrisA

On 10 February 2016 at 21:57, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
As others have suggested, I like the idea of keeping the grammar simple (i.e. numeric literals must start with a base appropriate digit, but may subsequently contain digits or underscores). I'd even apply that to float literals, with the avoidance of putting an underscore just before the floating point being a style issue, rather than a syntactic one. What kind of numeric grouping to use is also a style question - if it's an English-language project or an international project using metric values, then it would make sense to group by thousands. If it's a project written assuming maintainers can follow Chinese or Japanese, then it would make sense to group according to the conventions of those language communities, just as folks may already decide to do with variable names and comments. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Feb 9, 2016, at 18:17, Guido van Rossum wrote:
Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure about "12_34_56" but there are probably use cases for that too.)
For one, not all languages use the same divisions. In many Indian languages, the natural divisions would be like 1_23_45_678, whereas in Chinese and Japanese some people may want to use 1_2345_6789, though the western system is also common. Also, dare I suggest, 0x_0123_4567_89AB_CDEF? Arbitrary grouping may be useful in binary constants representing bit fields. For that matter, for an integer representing a fixed-point fractional quantity, one may want to use 1_234_567__0000, where the last separator represents the decimal place. Mathematical publications sometimes group digits after a decimal place into groups of five, e.g. Decimal("3.14159_26535_89793_84626") or combine these ideas for Fraction(3__14159_26535_89793_84626, 10**20) I don't know that there's any reason to build restrictions into the language, any more than to require certain integer constants to be in decimal and others to be in hexadecimal.

On Wed, Feb 10, 2016, at 12:42, Random832 wrote:
3__14159_26535_89793_84626
...And before anyone else points it out, I copied the fourth digit group from the wrong position. (I happen to have the first 17 digits memorized, so I typed the first three groups, and then mistakenly copied from after the last one I had memorized rather than the last one I'd typed).

One possible objection that nobody's raised: Separating groups of three is all well and good; to my western eyes, 10_000_000_000 is obviously 10 billion.* But someone from China is likely to use groups of four, and 100_0000_0000 is not obviously anything--my first thought is around 100 billion, but that can't be right, so I have to count up the digits. I still think this is a good suggestion, because 100000000000 is even more useless to me as 100_0000_0000, and far more likely to be concealing a typo. I just wanted to make sure everyone knew the issue. * If you're going to say "no, it's a milliard, you stupid American", go back to the early 70s, and bring your non-decimal currency with you. It's billion in English, and has been for 40+ years. Leave the fighting to the languages where it's ambiguous, like Portuguese or Finnish. Sent from my iPhone

Andrew Barnert via Python-ideas writes:
Not a problem for me any more, that's quite obviously "100-oku" (and if the unit is "yen", that roughly converts to USD100 million, very convenient). I suspect others will learn equally quickly if this becomes at all common and they need to read "Chinese" or "Japanese" code where it's used. Anyway, for me (YMMV) this really is a for-the-writer readability problem. Personally I can't imagine using it except interactively. If I think a number needs checking, I make it a named value, and often computed. (Eg, the OP's example would be 1 << 26).

2016-02-09 22:40 GMT+01:00 Manuel Cerón <ceronman@gmail.com>:
Yeah, I saw this (in Perl) and I think that it's a good idea.
Another option is to use spaces instead of underscores:
opts.write_buffer_size = 67 108 864
It sounds error-prone to me. It's common that I forget a comma in a tuple like : x = (1, 2 3) I expect a SyntaxError, not x = (1, 23). It can also occur on a single line: x = (1, 2 3)
I'm not sure that a PEP is required. We just have to clarify where underscore are allowed exactly. See the discussion above they are corner cases on float and complex numbers. Victor

opts.write_buffer_size = 67108864
The disadvantage is that, as far as I known, no other languages do this.
This is not true. It is absolutely legal in FORTRAN program f print*, 123 456 end will just print the number 123456. Hence for me as a FORTRAN this would seem a natural thing to do. Better than the underscores ... I would associate the with the LaTeX maths mode index operator, and then I would read 100_002 as the number 4. -Alexander

Well, as you know, even more strictly speaking, in a LaTeX text you'd have to write $100_{002}$. But in more colloquial use of LaTeX syntax, e.g., in plain text abstracts, the braces are often dropped for the sake of readability. Beside, base 0 or base 1 make little sense. -Alexander Sent from my Pixel C.

Alexander Heger wrote:
Well, as you know, even more strictly speaking, in a LaTeX text you'd have to write $100_{002}$.
+1 on allowing Python expressions to be written in LaTeX math mode. This is obviously why Guido has reserved $ for such a long time. He's just popped back in the time machine and told himself about this thread! -- Greg

On 10 February 2016 at 11:37, Alexander Heger <python@2sn.net> wrote:
But to be fair, in older fortrans at least (I'd like to hope it's got more sane these days) wasn't pro gr a m f p r i n t*,1 2 3 4 5 6 e nd just as valid? (IIRC, whitespace was ignored everywhere, even within keywords...) Paul

On 02/09/2016 10:40 PM, Manuel Cerón wrote:
Thoughts?
I like it, and for everybody to try it out, I posted a draft patch here: http://bugs.python.org/issue26331 Underscores are allowed anywhere in numeric literals, except: * at the beginning of a literal (obviously) * at the end of a literal * directly after a dot (since the underscore could start an attribute name) * directly after a sign in exponents (for consistency with leading signs) * in the middle of the "0x", "0o" or "0b" base specifiers Reviewers welcome! cheers, Georg

On Wed, Feb 10, 2016, at 12:51, Georg Brandl wrote:
I don't think it's particularly important to support this case, but the sequence digit/dot/name with no spaces between is a syntax error now, because the digit/dot is interpreted as a floating point constant.
* directly after a sign in exponents (for consistency with leading signs) * in the middle of the "0x", "0o" or "0b" base specifiers
Do you allow multiple underscores in a row? I mentioned a couple possible use cases for that.

On Wed, Feb 10, 2016, at 16:42, Chris Angelico wrote:
But only the first dot is part of the numeric literal and therefore even theoretically eligible to have an underscore before or after it accepted as part of the literal, so that goes without saying. I was just pointing out that the underscore this rule disallows can't actually start an attribute name.

Ian Kelly <ian.g.kelly@gmail.com> writes:
The exact same fact – that a proposed new syntax was previously a syntax error – is commonly presented as a positive. We know there are no valid Python programs already using the construct to mean something else. So I don't think it's reasonable to present that now as though it were negative. (good sigmonster, have a cookie) -- \ “… correct code is great, code that crashes could use | `\ improvement, but incorrect code that doesn’t crash is a | _o__) horrible nightmare.” —Chris Smith, 2008-08-22 | Ben Finney

On Tue, Feb 9, 2016 at 4:06 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
I think you misunderstand. The argument (which I agree with) is that the syntax error was considered beneficial, since it would catch common typos -- so that we're loath to make that valid code. -- --Guido van Rossum (python.org/~guido)

On 02/09/2016 04:06 PM, Ben Finney wrote:
Ian Kelly writes:
If the SyntaxError is also a common mistake, then suddenly having it be working, but wrong, code is a bad thing.
(good sigmonster, have a cookie)
“… correct code is great, code that crashes could use | improvement, but incorrect code that doesn’t crash is a | horrible nightmare.” —Chris Smith, 2008-08-22 | Yes, good sigmonster - "incorrect code that doesn't crash is a horrible nightmare" -- which is what could happen when a SyntaxError suddenly becomes syntacticly correct. -- ~Ethan~

On 02/09/2016 01:40 PM, Manuel Cerón wrote:
As I recall, a number of years ago we had this discussion and Guido approved the idea. The only email I could locate at the moment, though, shows his support of the idea, but not outright approval. [1] I dare say if somebody submitted a patch it would fare well. (As in: be accepted, not gone forever.) -- ~Ethan~ [1] https://mail.python.org/pipermail/python-ideas/2011-May/010157.html

On 9 Feb 2016 23:18, "Guido van Rossum" <guido@python.org> wrote:
Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure about "12_34_56" but there are probably use cases for that too.)
It would be useful for hex literals. There are other more confusing possibilities such as 1_._0_e_-_1_0. -- Oscar

On Feb 9, 2016, at 16:45, MRAB <python@mrabarnett.plus.com> wrote:
The Ada programming language allows underscores in numerals, but requires there to be a digit on both sides of the underscore.
I think Swift, Ruby, and most other languages allow runs of multiple underscores, and even trailing underscores. It seems like it's a lot easier to lex "digit (digit-or-underscore)*", "0x (hex-digit-or-underscore)+", etc. than to try to add restrictions. And not just for Python itself, but for anyone who wants to write a Python tokenizer or parser. And it's a shorter rule to document, and easier to remember. So, unless there's a really compelling reason for an extra restriction, I think it's better to leave the restrictions out (and make them style issues).

On 10 February 2016 at 00:45, MRAB <python@mrabarnett.plus.com> wrote:
The Ada programming language allows underscores in numerals, but requires there to be a digit on both sides of the underscore.
+1 to this. Nobody's given an important use-case for any of the odd cases (doubled or trailing underscores, or those oddly placed in floats) but there are legitimate reasons to want groups of different sizes, especially with non-decimal bases. Saying a digit is needed either side is both obvious and sufficient.

On 10 February 2016 at 03:32, Joshua Landau <joshua@landau.ws> wrote:
It's possible to get it right, but I think keeping the grammar simple and making the rest a style issue is the best approach. We don't disallow 0x6AfEbbC for example, but mixing case like that is ugly to read too. (I was originally going to say "Under that change, "23" becomes invalid" but then I realised I'd misread the grammar. Which sort of makes my point that we want to keep the rules simple :-)) Paul

On 9 February 2016 at 23:51, Guido van Rossum <guido@python.org> wrote:
I don't know what a bank sort code is (maybe a UK thing?)
It is a UK thing. It identifies the bank you opened your account with.
FWIW there are some edge cases to be decided: is _123 valid? or 123_? or 123__456?
_123 is currently a valid identifier:
123_ is not. There's no good reason to allow either though. If the purpose is to separate the digits for clarity then the underscore doesn't need to be at the beginning or the end. -- Oscar

On Wed, Feb 10, 2016 at 12:16:43AM +0000, Oscar Benjamin wrote:
Agreed. Disallow leading and trailing underscores, otherwise allow and ignore any number of underscores in integer literals so that all of these are legal: 123_456_789 0x1234_ABCD 0b1111_0000_1010_0101 0o12_34 For avoidance of doubt, there must be at least one digit before the first underscore. These are not allowed: -_123_456 +_123_456 (actually, they are allowed, since they're legal identifiers). Consecutive underscores will be allowed: 1234____5678 but the docs (PEP 8?) should say "don't do that". Likewise for excessive underscores: 1_2_3_4_5_6_7_8_9_0 These sorts of abuses are a style issue, not a syntax issue. Floats are more complex: 123_456.000_001e-23 looks okay to me, but what about this? 123_456_._000_001_e_-_23 I think that's ugly. Should we restrict underscores to being only between digits? Or just call that a style issue too? -- Steve

On Wed, 10 Feb 2016 11:50:41 +1100, Steven D'Aprano wrote:
Floats are more complex:
123_456.000_001e-23
Floats are less complex. Complexes are more complex: 22j 123_456.000_001e-23j :-)
looks okay to me, but what about this?
123_456_._000_001_e_-_23
Agreed: that is no longer more readable. All I can think of for a use case for leading underscores would be to line up values of different lengths: x4 = __4 x5 = _33 x6 = __4 x7 = 220 milli = 1e__-3 micro = 1e__-6 nano = 1e__-9 pico = 1e_-12 femto = 1e_-15 But PEP 8 already suggests otherwise: milli = 1e-3 micro = 1e-6 nano = 1e-9 pico = 1e-12 femto = 1e-15

On Wed, Feb 10, 2016 at 12:51 AM, Guido van Rossum <guido@python.org> wrote:
_123 is a valid identifier name, so no. For consistency, I think the leading underscore should be out too. Multiple underscores in the middle might be useful for separating millions and thousands: 700__000_000 but perhaps it's too much.

On 10 February 2016 at 01:07, Ethan Furman <ethan@stoneleaf.us> wrote:
and in talking about int() accepting underscored inputs:
It seems entirely harmless here. Also for float().
I don't agree with either of those. Syntax accepted by int() is less permissive than for int literals (e.g. int('0x1')) which is good because int is often used to process data form external sources. In this vain I'm not sure how I feel about int accepting non-ascii characters - perhaps there should be a separate int.from_ascii function for this purpose but that's a different subject. Having float() accept underscored inputs violates IEEE754. That doesn't mean it's impossible but why bother? -- Oscar

On Wed, Feb 10, 2016 at 12:35 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
+1. Keep int() as it is, so we don't get weird stuff happening and causing confusion. But I'm +1 on allowing underscores between digits (not straight after a decimal point in a float, though, as it looks like attribute access). ChrisA

On 10 February 2016 at 21:57, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
As others have suggested, I like the idea of keeping the grammar simple (i.e. numeric literals must start with a base appropriate digit, but may subsequently contain digits or underscores). I'd even apply that to float literals, with the avoidance of putting an underscore just before the floating point being a style issue, rather than a syntactic one. What kind of numeric grouping to use is also a style question - if it's an English-language project or an international project using metric values, then it would make sense to group by thousands. If it's a project written assuming maintainers can follow Chinese or Japanese, then it would make sense to group according to the conventions of those language communities, just as folks may already decide to do with variable names and comments. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Feb 9, 2016, at 18:17, Guido van Rossum wrote:
Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure about "12_34_56" but there are probably use cases for that too.)
For one, not all languages use the same divisions. In many Indian languages, the natural divisions would be like 1_23_45_678, whereas in Chinese and Japanese some people may want to use 1_2345_6789, though the western system is also common. Also, dare I suggest, 0x_0123_4567_89AB_CDEF? Arbitrary grouping may be useful in binary constants representing bit fields. For that matter, for an integer representing a fixed-point fractional quantity, one may want to use 1_234_567__0000, where the last separator represents the decimal place. Mathematical publications sometimes group digits after a decimal place into groups of five, e.g. Decimal("3.14159_26535_89793_84626") or combine these ideas for Fraction(3__14159_26535_89793_84626, 10**20) I don't know that there's any reason to build restrictions into the language, any more than to require certain integer constants to be in decimal and others to be in hexadecimal.

On Wed, Feb 10, 2016, at 12:42, Random832 wrote:
3__14159_26535_89793_84626
...And before anyone else points it out, I copied the fourth digit group from the wrong position. (I happen to have the first 17 digits memorized, so I typed the first three groups, and then mistakenly copied from after the last one I had memorized rather than the last one I'd typed).

One possible objection that nobody's raised: Separating groups of three is all well and good; to my western eyes, 10_000_000_000 is obviously 10 billion.* But someone from China is likely to use groups of four, and 100_0000_0000 is not obviously anything--my first thought is around 100 billion, but that can't be right, so I have to count up the digits. I still think this is a good suggestion, because 100000000000 is even more useless to me as 100_0000_0000, and far more likely to be concealing a typo. I just wanted to make sure everyone knew the issue. * If you're going to say "no, it's a milliard, you stupid American", go back to the early 70s, and bring your non-decimal currency with you. It's billion in English, and has been for 40+ years. Leave the fighting to the languages where it's ambiguous, like Portuguese or Finnish. Sent from my iPhone

Andrew Barnert via Python-ideas writes:
Not a problem for me any more, that's quite obviously "100-oku" (and if the unit is "yen", that roughly converts to USD100 million, very convenient). I suspect others will learn equally quickly if this becomes at all common and they need to read "Chinese" or "Japanese" code where it's used. Anyway, for me (YMMV) this really is a for-the-writer readability problem. Personally I can't imagine using it except interactively. If I think a number needs checking, I make it a named value, and often computed. (Eg, the OP's example would be 1 << 26).

2016-02-09 22:40 GMT+01:00 Manuel Cerón <ceronman@gmail.com>:
Yeah, I saw this (in Perl) and I think that it's a good idea.
Another option is to use spaces instead of underscores:
opts.write_buffer_size = 67 108 864
It sounds error-prone to me. It's common that I forget a comma in a tuple like : x = (1, 2 3) I expect a SyntaxError, not x = (1, 23). It can also occur on a single line: x = (1, 2 3)
I'm not sure that a PEP is required. We just have to clarify where underscore are allowed exactly. See the discussion above they are corner cases on float and complex numbers. Victor

opts.write_buffer_size = 67108864
The disadvantage is that, as far as I known, no other languages do this.
This is not true. It is absolutely legal in FORTRAN program f print*, 123 456 end will just print the number 123456. Hence for me as a FORTRAN this would seem a natural thing to do. Better than the underscores ... I would associate the with the LaTeX maths mode index operator, and then I would read 100_002 as the number 4. -Alexander

Well, as you know, even more strictly speaking, in a LaTeX text you'd have to write $100_{002}$. But in more colloquial use of LaTeX syntax, e.g., in plain text abstracts, the braces are often dropped for the sake of readability. Beside, base 0 or base 1 make little sense. -Alexander Sent from my Pixel C.

Alexander Heger wrote:
Well, as you know, even more strictly speaking, in a LaTeX text you'd have to write $100_{002}$.
+1 on allowing Python expressions to be written in LaTeX math mode. This is obviously why Guido has reserved $ for such a long time. He's just popped back in the time machine and told himself about this thread! -- Greg

On 10 February 2016 at 11:37, Alexander Heger <python@2sn.net> wrote:
But to be fair, in older fortrans at least (I'd like to hope it's got more sane these days) wasn't pro gr a m f p r i n t*,1 2 3 4 5 6 e nd just as valid? (IIRC, whitespace was ignored everywhere, even within keywords...) Paul

On 02/09/2016 10:40 PM, Manuel Cerón wrote:
Thoughts?
I like it, and for everybody to try it out, I posted a draft patch here: http://bugs.python.org/issue26331 Underscores are allowed anywhere in numeric literals, except: * at the beginning of a literal (obviously) * at the end of a literal * directly after a dot (since the underscore could start an attribute name) * directly after a sign in exponents (for consistency with leading signs) * in the middle of the "0x", "0o" or "0b" base specifiers Reviewers welcome! cheers, Georg

On Wed, Feb 10, 2016, at 12:51, Georg Brandl wrote:
I don't think it's particularly important to support this case, but the sequence digit/dot/name with no spaces between is a syntax error now, because the digit/dot is interpreted as a floating point constant.
* directly after a sign in exponents (for consistency with leading signs) * in the middle of the "0x", "0o" or "0b" base specifiers
Do you allow multiple underscores in a row? I mentioned a couple possible use cases for that.

On Wed, Feb 10, 2016, at 16:42, Chris Angelico wrote:
But only the first dot is part of the numeric literal and therefore even theoretically eligible to have an underscore before or after it accepted as part of the literal, so that goes without saying. I was just pointing out that the underscore this rule disallows can't actually start an attribute name.
participants (23)
-
Alexander Heger
-
Andrew Barnert
-
Ben Finney
-
Chris Angelico
-
Dan Sommers
-
Ethan Furman
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Ian Kelly
-
Joshua Landau
-
Manuel Cerón
-
MRAB
-
Nick Coghlan
-
Oscar Benjamin
-
Paul Moore
-
Random832
-
Rob Cliffe
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Victor Stinner