PEP 515: Underscores in Numeric Literals

This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion. cheers, Georg -------------------------------------------------------------------------------- PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6 Abstract and Rationale ====================== This PEP proposes to extend Python's syntax so that underscores can be used in integral and floating-point number literals. This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation. Examples:: # grouping decimal numbers by thousands amount = 10_000_000.0 # grouping hexadecimal addresses by words addr = 0xDEAD_BEEF # grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110 Specification ============= The current proposal is to allow underscores anywhere in numeric literals, with these exceptions: * Leading underscores cannot be allowed, since they already introduce identifiers. * Trailing underscores are not allowed, because they look confusing and don't contribute much to readability. * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up, because they are fixed strings and not logically part of the number. * No underscore allowed after a sign in an exponent (``1e-_5``), because underscores can also not be used after the signs in front of the number (``-1e5``). * No underscore allowed after a decimal point, because this leads to ambiguity with attribute access (the lexer cannot know that there is no number literal in ``foo._5``). There appears to be no reason to restrict the use of underscores otherwise. The production list for integer literals would therefore look like this:: integer: decimalinteger | octinteger | hexinteger | bininteger decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] nonzerodigit: "1"..."9" decimalrest: (digit | "_")* digit digit: "0"..."9" octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" bindigit: "0" | "1" For floating-point literals:: floatnumber: pointfloat | exponentfloat pointfloat: [intpart] fraction | intpart "." exponentfloat: (intpart | pointfloat) exponent intpart: digit (digit | "_")* fraction: "." intpart exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest] Alternative Syntax ================== Underscore Placement Rules -------------------------- Instead of the liberal rule specified above, the use of underscores could be limited. Common rules are (see the "other languages" section): * Only one consecutive underscore allowed, and only between digits. * Multiple consecutive underscore allowed, but only between digits. Different Separators -------------------- A proposed alternate syntax was to use whitespace for grouping. Although strings are a precedent for combining adjoining literals, the behavior can lead to unexpected effects which are not possible with underscores. Also, no other language is known to use this rule, except for languages that generally disregard any whitespace. C++14 introduces apostrophes for grouping, which is not considered due to the conflict with Python's string literals. [1]_ Behavior in Other Languages =========================== Those languages that do allow underscore grouping implement a large variety of rules for allowed placement of underscores. This is a listing placing the known rules into three major groups. In cases where the language spec contradicts the actual behavior, the actual behavior is listed. **Group 1: liberal (like this PEP)** * D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust [4]_ * Swift (although textual description says "between digits") [5]_ **Group 2: only between digits, multiple consecutive underscores** * C# (open proposal for 7.0) [6]_ * Java [7]_ **Group 3: only between digits, only one underscore** * Ada [8]_ * Julia (but not in the exponent part of floats) [9]_ * Ruby (docs say "anywhere", in reality only between digits) [10]_ Implementation ============== A preliminary patch that implements the specification given above has been posted to the issue tracker. [11]_ References ========== .. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html .. [2] http://dlang.org/spec/lex.html#integerliteral .. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors .. [4] http://doc.rust-lang.org/reference.html#number-literals .. [5] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift... .. [6] https://github.com/dotnet/roslyn/issues/216 .. [7] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-... .. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4 .. [9] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-... .. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers .. [11] http://bugs.python.org/issue26331 Copyright ========= This document has been placed in the public domain.

On Wed, 10 Feb 2016 at 14:21 Georg Brandl <g.brandl@gmx.net> wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
cheers, Georg
--------------------------------------------------------------------------------
PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6
Abstract and Rationale ======================
This PEP proposes to extend Python's syntax so that underscores can be used in integral and floating-point number literals.
This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation.
Examples::
# grouping decimal numbers by thousands amount = 10_000_000.0
# grouping hexadecimal addresses by words addr = 0xDEAD_BEEF
# grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110
I assume all of these examples are possible in either the liberal or restrictive approaches?
Specification =============
The current proposal is to allow underscores anywhere in numeric literals, with these exceptions:
* Leading underscores cannot be allowed, since they already introduce identifiers. * Trailing underscores are not allowed, because they look confusing and don't contribute much to readability. * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up, because they are fixed strings and not logically part of the number. * No underscore allowed after a sign in an exponent (``1e-_5``), because underscores can also not be used after the signs in front of the number (``-1e5``). * No underscore allowed after a decimal point, because this leads to ambiguity with attribute access (the lexer cannot know that there is no number literal in ``foo._5``).
There appears to be no reason to restrict the use of underscores otherwise.
The production list for integer literals would therefore look like this::
integer: decimalinteger | octinteger | hexinteger | bininteger decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] nonzerodigit: "1"..."9" decimalrest: (digit | "_")* digit digit: "0"..."9" octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" bindigit: "0" | "1"
For floating-point literals::
floatnumber: pointfloat | exponentfloat pointfloat: [intpart] fraction | intpart "." exponentfloat: (intpart | pointfloat) exponent intpart: digit (digit | "_")* fraction: "." intpart exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]
Alternative Syntax ==================
Underscore Placement Rules --------------------------
Instead of the liberal rule specified above, the use of underscores could be limited. Common rules are (see the "other languages" section):
* Only one consecutive underscore allowed, and only between digits. * Multiple consecutive underscore allowed, but only between digits.
Different Separators --------------------
A proposed alternate syntax was to use whitespace for grouping. Although strings are a precedent for combining adjoining literals, the behavior can lead to unexpected effects which are not possible with underscores. Also, no other language is known to use this rule, except for languages that generally disregard any whitespace.
C++14 introduces apostrophes for grouping, which is not considered due to the conflict with Python's string literals. [1]_
Behavior in Other Languages ===========================
Those languages that do allow underscore grouping implement a large variety of rules for allowed placement of underscores. This is a listing placing the known rules into three major groups. In cases where the language spec contradicts the actual behavior, the actual behavior is listed.
**Group 1: liberal (like this PEP)**
* D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust [4]_ * Swift (although textual description says "between digits") [5]_
**Group 2: only between digits, multiple consecutive underscores**
* C# (open proposal for 7.0) [6]_ * Java [7]_
**Group 3: only between digits, only one underscore**
* Ada [8]_ * Julia (but not in the exponent part of floats) [9]_ * Ruby (docs say "anywhere", in reality only between digits) [10]_
Implementation ==============
A preliminary patch that implements the specification given above has been posted to the issue tracker. [11]_
Is the implementation made easier or harder if we went with the Group 2 or 3 approaches? Are there any reasonable examples that the Group 1 approach allows that Group 3 doesn't that people have used in other languages? I'm +1 on the idea, but which approach I prefer is going to be partially dependent on the difficulty of implementing (else I say Group 3 to make it easier to explain the rules). -Brett

On 2016-02-10 22:35, Brett Cannon wrote: [snip]
Examples::
# grouping decimal numbers by thousands amount = 10_000_000.0
# grouping hexadecimal addresses by words addr = 0xDEAD_BEEF
# grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110
I assume all of these examples are possible in either the liberal or restrictive approaches?
[snip] Strictly speaking, "0b_0011_1111_0100_1110" wouldn't be valid if an underscore was allowed only between digits because the "b" isn't a digit. Similarly, "0x_FF_FF" wouldn't be valid, but "0xFF_FF" would.

On 02/10/2016 11:35 PM, Brett Cannon wrote:
Examples::
# grouping decimal numbers by thousands amount = 10_000_000.0
# grouping hexadecimal addresses by words addr = 0xDEAD_BEEF
# grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110
I assume all of these examples are possible in either the liberal or restrictive approaches?
The last one isn't for restrictive -- its first underscore isn't between digits.
Implementation ==============
A preliminary patch that implements the specification given above has been posted to the issue tracker. [11]_
Is the implementation made easier or harder if we went with the Group 2 or 3 approaches? Are there any reasonable examples that the Group 1 approach allows that Group 3 doesn't that people have used in other languages?
Group 3 is probably a little more work than group 2, since you have to make sure only one consecutive underscore is present. I don't see a point to that.
I'm +1 on the idea, but which approach I prefer is going to be partially dependent on the difficulty of implementing (else I say Group 3 to make it easier to explain the rules).
Based on the feedback so far, I have an easier rule in mind that I will base the next PEP revision on. It's basically "One ore more underscores allowed anywhere after a digit or a base specifier." This preserves my preferred non-restrictive cases (0b_1111_0000, 1.5_j) and disallows more controversial versions like "1.5e_+_2". cheers, Georg

On 2/11/2016 2:45 AM, Georg Brandl wrote: Thanks for grabbing this issue and moving it forward. I will like being about to write or read 200_000_000 and be sure I an right without counting 0s.
Based on the feedback so far, I have an easier rule in mind that I will base the next PEP revision on. It's basically
"One ore more underscores allowed anywhere after a digit or a base specifier."
This preserves my preferred non-restrictive cases (0b_1111_0000, 1.5_j) and disallows more controversial versions like "1.5e_+_2".
I like both choices above. I don't like trailing underscores for two reasons. 1. The stated purpose of adding '_'s is to visually separate. Trailing underscores do not do that. They serve no purpose. 2. Trailing _s are used to turn keywords (class) into identifiers (class_). To me, 123_ mentally clashes with this usage. If trailing _ is allowed, to simplify the implementation, I would like PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged". -- Terry Jan Reedy

On Feb 11, 2016, at 09:39, Terry Reedy <tjreedy@udel.edu> wrote:
If trailing _ is allowed, to simplify the implementation, I would like PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged".
That's a good point: we need style rules for PEP 8. But I think everything that's just obviously pointless (like putting an underscore between every pair of digits, or sprinkling underscores all over a huge number to make ASCII art), or already handled by other guidelines (e.g., using a ton of underscores to "line up a table" is the same as using a ton of spaces, which is already discouraged) doesn't really need to be covered. And I think trailing underscores probably fall into that category. It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this: While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like. 123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge) 123_456_789_012: better 123_456_789_012_: bad (trailing) 1_2_3_4_5_6: bad (too many) 1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise 3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise) 123.456_789e123: good 123.456_789e1_23: bad (never useful in exponent) 0x1234_5678: good 0o123_456: good 0x123_456_789: bad (3 hex digits is usually not a meaningful group) The one case that seems contentious is "123_456_j". Honestly, I don't care which way that goes, and I'd be fine if the PEP left out any mention of it, but if people feel strongly one way or the other, the PEP could just give it as a good or a bad example and that would be enough to clarify the intention.

On Feb 11, 2016, at 10:15, Andrew Barnert via Python-Dev <python-dev@python.org> wrote:
That's a good point: we need style rules for PEP 8.
One more point: should the tutorial mention underscores? It looks like the intro docs for a lot of the other languages do. And it would only take one short sentence in 3.1.1 Numbers to say that you can use underscores to make large numbers like 123_456.789_012 more readable.

On Thu, Feb 11, 2016 at 10:15 AM, Andrew Barnert via Python-Dev < python-dev@python.org> wrote:
On Feb 11, 2016, at 09:39, Terry Reedy <tjreedy@udel.edu> wrote:
If trailing _ is allowed, to simplify the implementation, I would like
PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged".
That's a good point: we need style rules for PEP 8.
But I think everything that's just obviously pointless (like putting an underscore between every pair of digits, or sprinkling underscores all over a huge number to make ASCII art), or already handled by other guidelines (e.g., using a ton of underscores to "line up a table" is the same as using a ton of spaces, which is already discouraged) doesn't really need to be covered. And I think trailing underscores probably fall into that category.
It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this:
While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like.
123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge) 123_456_789_012: better 123_456_789_012_: bad (trailing) 1_2_3_4_5_6: bad (too many) 1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise 3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise) 123.456_789e123: good 123.456_789e1_23: bad (never useful in exponent) 0x1234_5678: good 0o123_456: good 0x123_456_789: bad (3 hex digits is usually not a meaningful group)
The one case that seems contentious is "123_456_j". Honestly, I don't care which way that goes, and I'd be fine if the PEP left out any mention of it, but if people feel strongly one way or the other, the PEP could just give it as a good or a bad example and that would be enough to clarify the intention.
I imagine that for whatever "bad" grouping you can suggest, someone, somewhere, has a legitimate reason to use it. Any rule more complex than "Use underscores in numeric literals only when the improve clarity" is unnecessarily prescriptive. - Jeff

On Thursday, February 11, 2016 10:35 AM, Jeff Hardy <jdhardy@gmail.com> wrote:
On Thu, Feb 11, 2016 at 10:15 AM, Andrew Barnert via Python-Dev <python-dev@python.org> wrote:
That's a good point: we need style rules for PEP 8.
...
It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this:
While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like.
123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge) 123_456_789_012: better 123_456_789_012_: bad (trailing) 1_2_3_4_5_6: bad (too many) 1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise 3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise) 123.456_789e123: good 123.456_789e1_23: bad (never useful in exponent) 0x1234_5678: good 0o123_456: good 0x123_456_789: bad (3 hex digits is usually not a meaningful group)
I imagine that for whatever "bad" grouping you can suggest, someone, somewhere, has a legitimate reason to use it.
That's exactly why we should just have bad examples in the style guide, rather than coming up with style rules that try to strongly discourage them (or making them syntax errors).
Any rule more complex than "Use underscores in numeric literals only when the improve clarity" is unnecessarily prescriptive.
Your rule doesn't need to be stated at all. It's already a given that you shouldn't add semantically-meaningless characters anywhere unless they improve clarity.... I don't think saying that they're for "visually separating meaningful digit groups like thousands, bytes, and the like" is unnecessarily prescriptive. If someone comes up with a legitimate use for something we've never anticipated, it will almost certainly just be a way of grouping digits that's meaningful in a way we didn't anticipate. And, if not, it's just a style guideline, so it doesn't have to apply 100% of the time. If someone really comes up with something that has nothing to do with grouping digits, all the style guideline will do is make them stop and think about whether it really is a good use of underscores--and, if it is, they'll go ahead and do it.

On 2/10/2016 2:20 PM, Georg Brandl wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
cheers, Georg
--------------------------------------------------------------------------------
PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6
Abstract and Rationale ======================
This PEP proposes to extend Python's syntax so that underscores can be used in integral and floating-point number literals.
This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation.
Examples::
# grouping decimal numbers by thousands amount = 10_000_000.0
# grouping hexadecimal addresses by words addr = 0xDEAD_BEEF
# grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110
+1 You don't mention potential restrictions that decimal numbers should permit them only every three places, or hex ones only every 2 or 4, and your binary example mentions grouping into bytes, but actually groups into nybbles. But such restrictions would be annoying: if it is useful to the coder to use them, that is fine. But different situation may find other placements more useful... particularly in binary, as it might want to match widths of various bitfields. Adding that as a rejected consideration, with justifications, would be helpful.

On 02/10/2016 11:42 PM, Glenn Linderman wrote:
On 2/10/2016 2:20 PM, Georg Brandl wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
cheers, Georg
--------------------------------------------------------------------------------
PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2016 Python-Version: 3.6
Abstract and Rationale ======================
This PEP proposes to extend Python's syntax so that underscores can be used in integral and floating-point number literals.
This is a common feature of other modern languages, and can aid readability of long literals, or literals whose value should clearly separate into parts, such as bytes or words in hexadecimal notation.
Examples::
# grouping decimal numbers by thousands amount = 10_000_000.0
# grouping hexadecimal addresses by words addr = 0xDEAD_BEEF
# grouping bits into bytes in a binary literal flags = 0b_0011_1111_0100_1110
+1
You don't mention potential restrictions that decimal numbers should permit them only every three places, or hex ones only every 2 or 4, and your binary example mentions grouping into bytes, but actually groups into nybbles.
But such restrictions would be annoying: if it is useful to the coder to use them, that is fine. But different situation may find other placements more useful... particularly in binary, as it might want to match widths of various bitfields.
Adding that as a rejected consideration, with justifications, would be helpful.
I added a short paragraph. Thanks for the feedback, Georg

On 10 February 2016 at 22:20, Georg Brandl <g.brandl@gmx.net> wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
+1 on the PEP. Is there any value in allowing underscores in strings passed to the Decimal constructor as well? The same sorts of justifications would seem to apply. It's perfectly arguable that the change for Decimal would be so rarely used as to not be worth it, though, so I don't mind either way in practice. Paul

On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote:
On 10 February 2016 at 22:20, Georg Brandl <g.brandl@gmx.net> wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
+1 on the PEP. Is there any value in allowing underscores in strings passed to the Decimal constructor as well? The same sorts of justifications would seem to apply. It's perfectly arguable that the change for Decimal would be so rarely used as to not be worth it, though, so I don't mind either way in practice.
Let's delay making any change to string conversions for now, and that includes Decimal. We can also do this: Decimal("123_456_789.00000_12345_67890".replace("_", "")) for those who absolutely must include underscores in their numeric strings. The big win is for numeric literals, not numeric string conversions. -- Steve

On 10 February 2016 at 23:14, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote:
On 10 February 2016 at 22:20, Georg Brandl <g.brandl@gmx.net> wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
+1 on the PEP. Is there any value in allowing underscores in strings passed to the Decimal constructor as well? The same sorts of justifications would seem to apply. It's perfectly arguable that the change for Decimal would be so rarely used as to not be worth it, though, so I don't mind either way in practice.
Let's delay making any change to string conversions for now, and that includes Decimal. We can also do this:
Decimal("123_456_789.00000_12345_67890".replace("_", ""))
for those who absolutely must include underscores in their numeric strings. The big win is for numeric literals, not numeric string conversions.
Good point. Maybe add this as an example in the PEP to explain why conversions are excluded. But I did only mean the Decimal constructor, which I think of more as a "decimal literal" - whereas int() and float() are (in my mind at least) conversion functions and as such should not be coupled to literal format (for example, 0x0001 notation isn't supported by int()) Paul

On 02/11/2016 10:10 AM, Paul Moore wrote:
On 10 February 2016 at 23:14, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote:
On 10 February 2016 at 22:20, Georg Brandl <g.brandl@gmx.net> wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
+1 on the PEP. Is there any value in allowing underscores in strings passed to the Decimal constructor as well? The same sorts of justifications would seem to apply. It's perfectly arguable that the change for Decimal would be so rarely used as to not be worth it, though, so I don't mind either way in practice.
Let's delay making any change to string conversions for now, and that includes Decimal. We can also do this:
Decimal("123_456_789.00000_12345_67890".replace("_", ""))
for those who absolutely must include underscores in their numeric strings. The big win is for numeric literals, not numeric string conversions.
Good point. Maybe add this as an example in the PEP to explain why conversions are excluded. But I did only mean the Decimal constructor, which I think of more as a "decimal literal" - whereas int() and float() are (in my mind at least) conversion functions and as such should not be coupled to literal format (for example, 0x0001 notation isn't supported by int())
Actually, it is. Just not without a base argument, because the default base is 10. But both with base 0 and base 16, '0x' prefixes are allowed. That's why I'm leaning towards supporting the underscores. In any case I'm preparing the implementation. Georg

It looks like the implementation https://bugs.python.org/issue26331 only changes the Python parser. What about other functions converting strings to numbers at runtime like int(str) and float(str)? Paul also asked for Decimal(str). Victor

On 02/11/2016 12:04 AM, Victor Stinner wrote:
It looks like the implementation https://bugs.python.org/issue26331 only changes the Python parser.
What about other functions converting strings to numbers at runtime like int(str) and float(str)? Paul also asked for Decimal(str).
I added these as "Open Questions" to the PEP. For Decimal, it's probably a good idea. For int(), it should only be allowed with base argument = 0. For float() and complex(), probably. Georg

2016-02-11 9:11 GMT+01:00 Georg Brandl <g.brandl@gmx.net>:
On 02/11/2016 12:04 AM, Victor Stinner wrote:
It looks like the implementation https://bugs.python.org/issue26331 only changes the Python parser.
What about other functions converting strings to numbers at runtime like int(str) and float(str)? Paul also asked for Decimal(str).
I added these as "Open Questions" to the PEP.
Ok nice. Now another question :-) Would it be useful to add an option to repr(int) and repr(float), or a formatter to int.__format__() and float.__float__() to add an underscore for thousands. Currently, we have the "n" format which depends on the current LC_NUMERIC locale:
'{:n}'.format(1234) '1234' import locale; locale.setlocale(locale.LC_ALL, '') 'fr_FR.UTF-8' '{:n}'.format(1234) '1 234'
My idea:
(1234).__repr__(pep515=True) '1_234' (1234.0).__repr__(pep515=True) '1_234.0'
or maybe:
'{:pep515}'.format(1234) '1_234' '{:pep515}'.format(1234.0) '1_234.0'
I don't think that it would be a good idea to modify repr() default behaviour, it would likely break a lot of applications. Victor

On 11 February 2016 at 19:59, Victor Stinner <victor.stinner@gmail.com> wrote:
2016-02-11 9:11 GMT+01:00 Georg Brandl <g.brandl@gmx.net>:
On 02/11/2016 12:04 AM, Victor Stinner wrote:
It looks like the implementation https://bugs.python.org/issue26331 only changes the Python parser.
What about other functions converting strings to numbers at runtime like int(str) and float(str)? Paul also asked for Decimal(str).
I added these as "Open Questions" to the PEP.
Ok nice. Now another question :-)
Would it be useful to add an option to repr(int) and repr(float), or a formatter to int.__format__() and float.__float__() to add an underscore for thousands.
Given that str.format supports a thousands separator:
"{:,d}".format(100000000) '100,000,000'
it might be reasonable to permit "_" in place of "," in the format specifier. However, I'm not sure when you'd use it aside from code generation, and you can already insert the thousands separator and then replace "," with "_". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 02/11/2016 11:07 AM, Nick Coghlan wrote:
On 11 February 2016 at 19:59, Victor Stinner <victor.stinner@gmail.com> wrote:
2016-02-11 9:11 GMT+01:00 Georg Brandl <g.brandl@gmx.net>:
On 02/11/2016 12:04 AM, Victor Stinner wrote:
It looks like the implementation https://bugs.python.org/issue26331 only changes the Python parser.
What about other functions converting strings to numbers at runtime like int(str) and float(str)? Paul also asked for Decimal(str).
I added these as "Open Questions" to the PEP.
Ok nice. Now another question :-)
Would it be useful to add an option to repr(int) and repr(float), or a formatter to int.__format__() and float.__float__() to add an underscore for thousands.
Given that str.format supports a thousands separator:
"{:,d}".format(100000000) '100,000,000'
it might be reasonable to permit "_" in place of "," in the format specifier.
However, I'm not sure when you'd use it aside from code generation, and you can already insert the thousands separator and then replace "," with "_".
It would make "SI style" [0] numbers a little bit more straightforward to generate, since the order of operations wouldn't matter. Currently it's: "{:,}".format(1234.5678).replace(',', ' ').replace('.', ',') Also it would make numbers with decimal comma and dot as separator a bit easier to generate. Currently, that's (from PEP 378): format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".") [0] https://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use

On Thu, Feb 11, 2016 at 08:07:56PM +1000, Nick Coghlan wrote:
Given that str.format supports a thousands separator:
"{:,d}".format(100000000) '100,000,000'
it might be reasonable to permit "_" in place of "," in the format specifier.
+1
However, I'm not sure when you'd use it aside from code generation, and you can already insert the thousands separator and then replace "," with "_".
It's not always easy or convenient to call .replace(",", "_") on the output of format: "With my help, the {} caught {:,d} ants.".format("aardvark", 100000000) would need to be re-written as something like: py> "With my help, the {} caught {} ants.".format("aardvark", "{:,d}".format(100000000).replace(",", "_")) 'With my help, the aardvark caught 100_000_000 ants.' -- Steve

On Feb 10, 2016, at 14:20, Georg Brandl <g.brandl@gmx.net> wrote: First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly).
* Trailing underscores are not allowed, because they look confusing and don't contribute much to readability.
Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6" is just fine, or "123e__+456"? More to the point, if we really need an extra rule, and more complicated BNF, to outlaw this case, I don't think we want a liberal design at all. Also, notice that Swift, Rust, and D all show examples with trailing underscores in their references, and they don't look particularly out of place with the other examples.
There appears to be no reason to restrict the use of underscores otherwise.
What other restrictions are there? I think the only place you've left that's not between digits is between the e and the sign. A dead-simple rule like Swift's seems better than five separate rules that I have to learn and remember that make lexing more complicated and that ultimately amount to the conservative rule plus one other place I can put underscores where I'd never want to.
**Group 1: liberal (like this PEP)**
* D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust [4]_ * Swift (although textual description says "between digits") [5]_
I don't think any of these are liberal like this PEP. For example, Swift's actual grammar rule allows underscores anywhere but leading in the "digits" part of int literals and all three potential digit parts of float literals. That's the whole rule. It's more conservative than this PEP in not allowing them outside of digit parts (like between E and +), more liberal in allowing them to be trailing, but I'm pretty sure the reason behind the design wasn't specifically about how liberal or conservative they wanted to be, but about being as simple as possible. Rust's rule seems to be equivalent to Swift's, except that they forgot to define exponents anywhere. I don't think either of them was trying to be more liberal or more conservative; rather, they were both trying to be as simple as possible. D does go out of its way to be as liberal as possible, e.g., allowing things like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit part, which can't have leading underscores), but it's also more conservative than this spec in not allowing underscores between e and the sign. I think Perl is the only language that allows them anywhere but in the digits part.

On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev wrote:
On Feb 10, 2016, at 14:20, Georg Brandl <g.brandl@gmx.net> wrote:
First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly).
* Trailing underscores are not allowed, because they look confusing and don't contribute much to readability.
Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6" is just fine,
It's not just fine, it's ugly as sin, but it shouldn't be a matter for the parser to decide a style-issue. Just as we allow people to write ugly tuples: t = ( 1 , 2, 3 ,4, 5, ) so we should allow people to write ugly ints rather than try to enforce good taste in the parser. There are uses for allowing multiple underscores, and odd groupings, so rather than a blanket ban, we trust that people won't do stupid things.
or "123e__+456"?
That I would prohibit. I think that the decimal point and exponent sign provide sufficient visual distinctiveness that putting underscores around them doesn't gain you anything. In some cases it looks like you might have missed a group of digits: 1.234_e-89 hints that perhaps there ought to be more digits after the 4. I'd be okay with a rule "no underscores in the exponent at all", but I don't particularly see the need for it since that's pretty much covered by the style guide saying "don't use underscores unnecessarily". For floats, exponents have a practical limitation of three digits, so there's not much need for grouping them. +1 on allowing underscores between digits +0 on prohibiting underscores in the exponent
More to the point, if we really need an extra rule, and more complicated BNF, to outlaw this case, I don't think we want a liberal design at all.
I think "underscores can occur between any two digits" is pretty liberal, since it allows multiple underscores, and allows grouping in any size group (including mixed sizes, and stupid sizes like 1). To me, the opposite of a liberal rule is something like "underscores may only occur between groups of three digits".
Also, notice that Swift, Rust, and D all show examples with trailing underscores in their references, and they don't look particularly out of place with the other examples.
That's a matter of opinion. -- Steve

On Feb 10, 2016, at 16:21, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev wrote: On Feb 10, 2016, at 14:20, Georg Brandl <g.brandl@gmx.net> wrote:
First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly).
* Trailing underscores are not allowed, because they look confusing and don't contribute much to readability.
Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6" is just fine,
It's not just fine, it's ugly as sin, but it shouldn't be a matter for the parser to decide a style-issue.
Exactly. So why should it be any more of a matter for the parser to decide that "123_456_" is illegal? Leave that in the style guide, and keep the parser, and the reference documentation, as simple as possible.
or "123e__+456"?
That I would prohibit.
The PEP allows that. The simpler rule used by Swift and Rust prohibits it.
More to the point, if we really need an extra rule, and more complicated BNF, to outlaw this case, I don't think we want a liberal design at all.
I think "underscores can occur between any two digits" is pretty liberal, since it allows multiple underscores, and allows grouping in any size group (including mixed sizes, and stupid sizes like 1).
The PEP calls that a type-2 conservative proposal, and uses "liberal" to mean that underscores can appear in places that aren't between digits. I don't think we want that liberalism, especially if it requires 5 rules instead of 1 to get it right. Again, Swift and Rust only allow underscores in the digit part of integers, and the up to three digit parts of floats, and the only rule they impose is no leading underscore. (In some caass they lead to ambiguity, in others they don't, but it's easier to just always ban them.) I don't see anything wrong with that rule. The fact that it doesn't allow "1.2e_+3" seems fine. The fact that it doesn't prevent "123_" seems fine also. It's not about being as liberal as possible, or as restrictive as possible, because those edge cases just don't matter, so being as simple as possible seems like an obvious win.
Also, notice that Swift, Rust, and D all show examples with trailing underscores in their references, and they don't look particularly out of place with the other examples.
That's a matter of opinion.
Sure, but it's apparently the opinion of the people who designed and/or documented this feature in three out of the four languages I looked at (aka every language but Perl), not mine. And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"? They're both presented as something the syntax allows, and neither one looks like something I'd ever want to write, much less promote in a style guide or something, but neither one screams out as something that's so heinous we need to complicate the language to ensure it raises a SyntaxError. Yes, that's my opinion, but do.you really have a different opinion about any part of that?

On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"?
Yes I am, because 123_456_ looks like you've forgotten to finish typing the last group of digits, while 1_23__4 merely looks like you have no taste.
They're both presented as something the syntax allows, and neither one looks like something I'd ever want to write, much less promote in a style guide or something, but neither one screams out as something that's so heinous we need to complicate the language to ensure it raises a SyntaxError. Yes, that's my opinion, but do.you really have a different opinion about any part of that?
I don't think the rule "underscores must occur between digits" is complicating the specification. It is *less* complicated to explain this rule than to give a whole lot of special cases - can you use a leading or trailing underscore? - can an underscore follow the base prefix 0b 0o 0x? - can an underscore precede or follow the decimal place? - can an underscore precede or follow a + or - sign? - can an underscore precede or follow the e|E exponent symbol? - can an underscore precede or follow the j suffix for complex numbers? versus - underscores can only appear between (hex)digits. I'm not sure why you seem to think that "only between digits" is more complex than the alternative -- to me it is less complex, with no special cases to memorise, just one general rule. Of course, if (generic) you think that it is a feature to be able to put underscores before the decimal point, after the E exponent, etc. then you will dislike my suggested rule. That's okay, but in that case, it is not because of "simplicity|complexity" but because (generic) you want to be able to write things which my rule would prohibit. -- Steve

On Feb 11, 2016, at 02:13, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
They're both presented as something the syntax allows, and neither one looks like something I'd ever want to write, much less promote in a style guide or something, but neither one screams out as something that's so heinous we need to complicate the language to ensure it raises a SyntaxError. Yes, that's my opinion, but do.you really have a different opinion about any part of that?
I don't think the rule "underscores must occur between digits" is complicating the specification.
That rule isn't in the specification in the PEP, except as one of the alternatives rejected for being "too restrictive". It's also not the rule you were suggesting in your previous email, arguing where you insisted that you wanted something "more liberal". I also don't understand why you're presenting this whole thing as an argument against my response, which was suggesting that whatever rule we choose should be simpler than what's in the PEP, when that's also (apparently, now) your position.
It is *less* complicated to explain this rule than to give a whole lot of special cases
Sure. Your rule is about as complicated as the Swift rule, and both are much less complicated than the PEP. I'm fine with either one, because, as I said, the edge cases don't matter to me nearly as much as having a rule that's easy to keep it my head and easy to lex. The only reason I specifically proposed the Swift rule instead of one of the other simple rules is that it seemed the most "liberal", which the PEP was in favor of, and and it has precedent in more other languages. But, in favor of your version, almost every language uses some variation of "you can put underscores between digits" as the "tutorial-level" explanation and rationale.

On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"?
Yes I am, because 123_456_ looks like you've forgotten to finish typing the last group of digits, while 1_23__4 merely looks like you have no taste.
OK, but the keyword in your sentence is "taste". If we update PEP 8 for our needs to say "Numerical literals should not have multiple underscores in a row or have a trailing underscore" then this is taken care of. We get a dead-simple rule for when underscores can be used, the implementation is simple, and we get to have more tasteful usage in the stdlib w/o forcing our tastes upon everyone or complicating the rules or implementation.

On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote:
On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"?
Yes I am, because 123_456_ looks like you've forgotten to finish typing the last group of digits, while 1_23__4 merely looks like you have no taste.
OK, but the keyword in your sentence is "taste".
I disagree. The key *idea* in my sentence is that the trailing underscore looks like a programming error. In my opinion, avoiding that impression is important enough to make trailing underscores a syntax error. I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I haven't seen anyone in favour of trailing underscores. Does anyone think there is a good case for allowing trailing underscores?
If we update PEP 8 for our needs to say "Numerical literals should not have multiple underscores in a row or have a trailing underscore" then this is taken care of. We get a dead-simple rule for when underscores can be used, the implementation is simple, and we get to have more tasteful usage in the stdlib w/o forcing our tastes upon everyone or complicating the rules or implementation.
I think this is a misrepresentation of the alternative. As I see it, we have two alternatives: - one or more underscores can appear AFTER the base specifier or any digit; - one or more underscores can appear BETWEEN two digits. To describe the second alternative as "complicating the rules" is, I think, grossly unfair. And if Serhiy's proposal is correct, the implementation is also no more complicated: # underscores after digits octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* # underscores between digits octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)* The idea that the second alternative "forc[es] our tastes on everyone" while the first does not is bogus. The first alternative also prohibits things which are a matter of taste: # prohibited in both alternatives 0_xDEADBEEF 0._1234 1.2e_99 -_1 1j_ I think that there is broad agreement that: - the basic idea is sound - leading underscores followed by digits are currently legal identifiers and this will not change - underscores should not follow the sign - + - underscores should not follow the decimal point . - underscores should not follow the exponent e|E - underscores will not be permitted inside the exponent (even if it is harmless, it's silly to write 1.2e9_9) - underscores should not follow the complex suffix j and only minor disagreement about: - whether or not underscores will be allowed after the base specifier 0x 0o 0b - whether or not underscores will be allowed before the decimal point, exponent and complex suffix. Can we have a show of hands, in favour or against the above two? And then perhaps Guido can rule on this one way or the other and we can get back to arguing about more important matters? :-) In case it isn't obvious, I prefer to say No to allowing underscores after the base specifier, or before the decimal point, exponent and complex suffix. -- Steve

On 12 February 2016 at 00:16, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote:
On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"?
Yes I am, because 123_456_ looks like you've forgotten to finish typing the last group of digits, while 1_23__4 merely looks like you have no taste.
OK, but the keyword in your sentence is "taste".
I disagree. The key *idea* in my sentence is that the trailing underscore looks like a programming error. In my opinion, avoiding that impression is important enough to make trailing underscores a syntax error.
I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I haven't seen anyone in favour of trailing underscores. Does anyone think there is a good case for allowing trailing underscores?
If we update PEP 8 for our needs to say "Numerical literals should not have multiple underscores in a row or have a trailing underscore" then this is taken care of. We get a dead-simple rule for when underscores can be used, the implementation is simple, and we get to have more tasteful usage in the stdlib w/o forcing our tastes upon everyone or complicating the rules or implementation.
I think this is a misrepresentation of the alternative. As I see it, we have two alternatives:
- one or more underscores can appear AFTER the base specifier or any digit; +1
- one or more underscores can appear BETWEEN two digits. -0
Having underscores between digits is the main usage, but I don’t see much harm in the more liberal version, unless it that makes the specification or implementation too complex. Allowing stuff like 0x_100, 4.7_e3, and 1_j seems of slightly more benefit IMO than disallowing 1_000_.
To describe the second alternative as "complicating the rules" is, I think, grossly unfair. And if Serhiy's proposal is correct, the implementation is also no more complicated:
# underscores after digits octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
# underscores between digits octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*
The idea that the second alternative "forc[es] our tastes on everyone" while the first does not is bogus. The first alternative also prohibits things which are a matter of taste:
# prohibited in both alternatives 0_xDEADBEEF 0._1234 1.2e_99 -_1
1j_
I think that there is broad agreement that:
- the basic idea is sound - leading underscores followed by digits are currently legal identifiers and this will not change +1 to both - underscores should not follow the sign - + - underscores should not follow the decimal point . - underscores should not follow the exponent e|E No strong opinion on these from me - underscores will not be permitted inside the exponent (even if it is harmless, it's silly to write 1.2e9_9) -0, it seems like a needless inconsistency, unless it somehow hurts
This one is already a valid variable identifier name. the implementation
- underscores should not follow the complex suffix j No opinion
and only minor disagreement about:
- whether or not underscores will be allowed after the base specifier 0x 0o 0b +0
- whether or not underscores will be allowed before the decimal point, exponent and complex suffix. No opinion about directly before decimal point; +0 before exponent or imaginary (complex) suffix.
Can we have a show of hands, in favour or against the above two? And then perhaps Guido can rule on this one way or the other and we can get back to arguing about more important matters? :-)
In case it isn't obvious, I prefer to say No to allowing underscores after the base specifier, or before the decimal point, exponent and complex suffix.

On 2/11/2016 4:16 PM, Steven D'Aprano wrote:
On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote:
On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"? Yes I am, because 123_456_ looks like you've forgotten to finish typing the last group of digits, while 1_23__4 merely looks like you have no taste.
OK, but the keyword in your sentence is "taste". I disagree. The key *idea* in my sentence is that the trailing underscore looks like a programming error. In my opinion, avoiding that impression is important enough to make trailing underscores a syntax error.
I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I haven't seen anyone in favour of trailing underscores. Does anyone think there is a good case for allowing trailing underscores?
If we update PEP 8 for our needs to say "Numerical literals should not have multiple underscores in a row or have a trailing underscore" then this is taken care of. We get a dead-simple rule for when underscores can be used, the implementation is simple, and we get to have more tasteful usage in the stdlib w/o forcing our tastes upon everyone or complicating the rules or implementation. I think this is a misrepresentation of the alternative. As I see it, we have two alternatives:
- one or more underscores can appear AFTER the base specifier or any digit; - one or more underscores can appear BETWEEN two digits.
To describe the second alternative as "complicating the rules" is, I think, grossly unfair. And if Serhiy's proposal is correct, the implementation is also no more complicated:
# underscores after digits octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
# underscores after digits octinteger: "0" ("o" | "O") (octdigit | "_")* hexinteger: "0" ("x" | "X") (hexdigit | "_")* bininteger: "0" ("b" | "B") (bindigit | "_")* An extra side effect is that there are more ways to write zero. 0x, 0b, 0o, 0X, 0B, 0O, 0x_, 0b_, 0o_, etc. But most people write 0 anyway, so those would be bad style, anyway, but it makes the implementation simpler.
# underscores between digits octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*
The idea that the second alternative "forc[es] our tastes on everyone" while the first does not is bogus. The first alternative also prohibits things which are a matter of taste:
# prohibited in both alternatives 0_xDEADBEEF 0._1234 1.2e_99 -_1 1j_
I think that there is broad agreement that:
- the basic idea is sound - leading underscores followed by digits are currently legal identifiers and this will not change - underscores should not follow the sign - + - underscores should not follow the decimal point . - underscores should not follow the exponent e|E - underscores will not be permitted inside the exponent (even if it is harmless, it's silly to write 1.2e9_9) - underscores should not follow the complex suffix j
and only minor disagreement about:
- whether or not underscores will be allowed after the base specifier 0x 0o 0b
+1 to allow underscores after the base specifier.
- whether or not underscores will be allowed before the decimal point, exponent and complex suffix.
+1 to allow them. There may be cases where they are useful, and if it is not useful, it would not be used. I really liked someone's style guide proposal: use of underscore within numeric constants should only be done to aid readability. However, pre-judging what aids readability to one person's particular taste is inappropriate.
Can we have a show of hands, in favour or against the above two? And then perhaps Guido can rule on this one way or the other and we can get back to arguing about more important matters? :-)
In case it isn't obvious, I prefer to say No to allowing underscores after the base specifier, or before the decimal point, exponent and complex suffix. I think it was obvious :) And I think we disagree. And yes, there are more important matters. But it was just a couple days ago when I wrote a big constant in some new code that I was thinking how nice it would be if I could put a delimiter in there... so I'll be glad for the feature when it is available.

Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos. https://en.m.wikipedia.org/wiki/Crore On Feb 11, 2016 7:04 PM, "Glenn Linderman" <v+python@g.nevcal.com> wrote:
On 2/11/2016 4:16 PM, Steven D'Aprano wrote:
On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote:
On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano <steve@pearwood.info> <steve@pearwood.info> wrote:
On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"?
Yes I am, because 123_456_ looks like you've forgotten to finish typing the last group of digits, while 1_23__4 merely looks like you have no taste.
OK, but the keyword in your sentence is "taste".
I disagree. The key *idea* in my sentence is that the trailing underscore looks like a programming error. In my opinion, avoiding that impression is important enough to make trailing underscores a syntax error.
I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I haven't seen anyone in favour of trailing underscores. Does anyone think there is a good case for allowing trailing underscores?
If we update PEP 8 for our needs to say "Numerical literals should not have multiple underscores in a row or have a trailing underscore" then this is taken care of. We get a dead-simple rule for when underscores can be used, the implementation is simple, and we get to have more tasteful usage in the stdlib w/o forcing our tastes upon everyone or complicating the rules or implementation.
I think this is a misrepresentation of the alternative. As I see it, we have two alternatives:
- one or more underscores can appear AFTER the base specifier or any digit; - one or more underscores can appear BETWEEN two digits.
To describe the second alternative as "complicating the rules" is, I think, grossly unfair. And if Serhiy's proposal is correct, the implementation is also no more complicated:
# underscores after digits octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
# underscores after digits octinteger: "0" ("o" | "O") (octdigit | "_")* hexinteger: "0" ("x" | "X") (hexdigit | "_")* bininteger: "0" ("b" | "B") (bindigit | "_")*
An extra side effect is that there are more ways to write zero. 0x, 0b, 0o, 0X, 0B, 0O, 0x_, 0b_, 0o_, etc. But most people write 0 anyway, so those would be bad style, anyway, but it makes the implementation simpler.
# underscores between digits octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)* hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)* bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*
The idea that the second alternative "forc[es] our tastes on everyone" while the first does not is bogus. The first alternative also prohibits things which are a matter of taste:
# prohibited in both alternatives 0_xDEADBEEF 0._1234 1.2e_99 -_1 1j_
I think that there is broad agreement that:
- the basic idea is sound - leading underscores followed by digits are currently legal identifiers and this will not change - underscores should not follow the sign - + - underscores should not follow the decimal point . - underscores should not follow the exponent e|E - underscores will not be permitted inside the exponent (even if it is harmless, it's silly to write 1.2e9_9) - underscores should not follow the complex suffix j
and only minor disagreement about:
- whether or not underscores will be allowed after the base specifier 0x 0o 0b
+1 to allow underscores after the base specifier.
- whether or not underscores will be allowed before the decimal point, exponent and complex suffix.
+1 to allow them. There may be cases where they are useful, and if it is not useful, it would not be used. I really liked someone's style guide proposal: use of underscore within numeric constants should only be done to aid readability. However, pre-judging what aids readability to one person's particular taste is inappropriate.
Can we have a show of hands, in favour or against the above two? And then perhaps Guido can rule on this one way or the other and we can get back to arguing about more important matters? :-)
In case it isn't obvious, I prefer to say No to allowing underscores after the base specifier, or before the decimal point, exponent and complex suffix.
I think it was obvious :) And I think we disagree. And yes, there are more important matters. But it was just a couple days ago when I wrote a big constant in some new code that I was thinking how nice it would be if I could put a delimiter in there... so I'll be glad for the feature when it is available.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx

On 2/11/2016 7:56 PM, David Mertz wrote:
Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.
Interesting... 3 digits in the least significant group, and _then_ by twos. Wouldn't have predicted that one! Never bumped into that notation before!

On Thursday, February 11, 2016 8:10 PM, Glenn Linderman <v+python@g.nevcal.com> wrote:
On 2/11/2016 7:56 PM, David Mertz wrote:
Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.
Interesting... 3 digits in the least significant group, and _then_ by twos. Wouldn't have predicted that one! Never bumped into that notation before!
The first time I used underscore separators in any language, it was a test script for a server that wanted social security numbers as integers instead of strings, like 123_45_6789.[^1] Which is why I suggested the style guideline should just say "meaningful grouping of digits", rather than try to predict what counts as "meaningful" for every program. [^1] Of course in Python, it's usually trivial to stick a shim in between the database and the model thingy so I could just pass in "123-45-6789", so I don't expect to ever need this specific example.

On 2/11/2016 8:22 PM, Andrew Barnert wrote:
On Thursday, February 11, 2016 8:10 PM, Glenn Linderman <v+python@g.nevcal.com> wrote:
On 2/11/2016 7:56 PM, David Mertz wrote:
Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.
Interesting... 3 digits in the least significant group, and _then_ by twos. Wouldn't have predicted that one! Never bumped into that notation before!
The first time I used underscore separators in any language, it was a test script for a server that wanted social security numbers as integers instead of strings, like 123_45_6789.[^1]
Which is why I suggested the style guideline should just say "meaningful grouping of digits", rather than try to predict what counts as "meaningful" for every program.
[^1] Of course in Python, it's usually trivial to stick a shim in between the database and the model thingy so I could just pass in "123-45-6789", so I don't expect to ever need this specific example.
Yes, I had thought of the Social Security Number possibility also, although having them as constants in a program seems a bit unusual. Test script, fake numbers, yeah, I guess so.

On 12 February 2016 at 00:16, Steven D'Aprano <steve@pearwood.info> wrote:
I think that there is broad agreement that:
- the basic idea is sound - leading underscores followed by digits are currently legal identifiers and this will not change - underscores should not follow the sign - + - underscores should not follow the decimal point . - underscores should not follow the exponent e|E - underscores will not be permitted inside the exponent (even if it is harmless, it's silly to write 1.2e9_9) - underscores should not follow the complex suffix j
and only minor disagreement about:
- whether or not underscores will be allowed after the base specifier 0x 0o 0b - whether or not underscores will be allowed before the decimal point, exponent and complex suffix.
Can we have a show of hands, in favour or against the above two? And then perhaps Guido can rule on this one way or the other and we can get back to arguing about more important matters? :-)
In case it isn't obvious, I prefer to say No to allowing underscores after the base specifier, or before the decimal point, exponent and complex suffix.
I have no opinion on anything other than that whatever syntax is implemented as long as it allows single underscores between digits, such as 1_000_000 Everything else is irrelevant to me, and if I read code that uses anything else, I'd judge it based on readability and style, and wouldn't care about arguments that "it's allowed by the grammar". Paul

On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore <p.f.moore@gmail.com> wrote:
I have no opinion on anything other than that whatever syntax is implemented as long as it allows single underscores between digits, such as
1_000_000
Everything else is irrelevant to me, and if I read code that uses anything else, I'd judge it based on readability and style, and wouldn't care about arguments that "it's allowed by the grammar".
I totally agree -- and it's clear that other cultures group digits differently, so we should allow that, but while I'll live with it either way, I'd rather have it be as restrictive as possible rather than as unrestricted as possible. As in: no double underscores no underscore right before or after a period no underscore at the beginning or end. .... As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement.... As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 2016-02-12 20:06, Chris Barker wrote:
On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore <p.f.moore@gmail.com <mailto:p.f.moore@gmail.com>> wrote:
I have no opinion on anything other than that whatever syntax is implemented as long as it allows single underscores between digits, such as
1_000_000
Everything else is irrelevant to me, and if I read code that uses anything else, I'd judge it based on readability and style, and wouldn't care about arguments that "it's allowed by the grammar".
I totally agree -- and it's clear that other cultures group digits differently, so we should allow that, but while I'll live with it either way, I'd rather have it be as restrictive as possible rather than as unrestricted as possible. As in:
no double underscores no underscore right before or after a period no underscore at the beginning or end. ....
As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement....
As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant...
That also applies to telephone numbers, account numbers, etc. They aren't really numbers (you wouldn't do arithmetic on them) and might have leading zeros.

On 2/12/2016 12:06 PM, Chris Barker wrote:
On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore <p.f.moore@gmail.com <mailto:p.f.moore@gmail.com>> wrote:
I have no opinion on anything other than that whatever syntax is implemented as long as it allows single underscores between digits, such as
1_000_000
Everything else is irrelevant to me, and if I read code that uses anything else, I'd judge it based on readability and style, and wouldn't care about arguments that "it's allowed by the grammar".
I totally agree -- and it's clear that other cultures group digits differently, so we should allow that, but while I'll live with it either way, I'd rather have it be as restrictive as possible rather than as unrestricted as possible. As in:
no double underscores
Useful for really long binary constants... one _ for nybble or field divisions, two __ for byte divisions. Of course, really long binary constants might be a bad idea.
no underscore right before or after a period no underscore at the beginning or end.
You get your wish for the beginning... it would be ambiguous with identifiers. And your style guide can include whatever restrictions you like, for your code.
....
As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement....
As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant...
SS#... why not integer? Phone#... why not integer? There's a lot of nice digit-division conventions for phone#s in different parts of the world. The only ambiguity is if such numbers have leading zeros, you have to "know" (or record) how many total digits are expected.

On Feb 12, 2016, at 12:58, Glenn Linderman <v+python@g.nevcal.com> wrote:
On 2/12/2016 12:06 PM, Chris Barker wrote: As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant...
SS#... why not integer? Phone#... why not integer? There's a lot of nice digit-division conventions for phone#s in different parts of the world.
I'm the one who brought up the SSN example--and, as I said at the time, I almost certainly wouldn't have done that in Python. I was maintaining tests for a service that stored SSNs as integers (which I think is a mistake, but I couldn't change it), a automatically-generated strongly-typed interface to that service (which is good), and no easy way to wrap or hook that interface (which is bad). In Python, it's hard to imagine how I'd end up with a situation where I couldn't wrap or hook the interface and treat SSNs as strings in my test code. (In fact, for complicated tests, I did exactly that in Python to make sure they were correct, then ported them over to integrate with the test suite...) And anyway, the only point was that I've actually used a grouping that isn't "every 3 digits" and it didn't end the world. I think everyone agrees that some such groupings will come up--even if not every specific examples is good, there are some that are. Even the people who want something more conservative than the PEP doesn't seem to be taking that position--they may not want double underscores, or "123_456_j", but they're fine with "if yuan > 9999_9999:". So, either we try to anticipate every possible way people might want to group numbers and decide which ones are good or bad, or we just let the style guide say "meaningful group of digits" and let each developer decide what counts as "meaningful" for their application. Does anyone really want to argue for the former? If not, why not just settle that and go back to bikeshedding the cases that *are* contended, like "123_456_j"? (I'm happy either way, as long as the grammar rule is dead simple and the PEP 8 rule is pretty simple, but I know others have strong, and conflicting, opinions on that.)

On 12 February 2016 at 20:06, Chris Barker <chris.barker@noaa.gov> wrote:
As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement....
Just to be clear, I'm personally in favour of less restrictions rather than more (as a general principle) - consenting adults and all that. But I'm also in favour of less debate rather than more on this issue, so I'll shut up at this point :-) Paul

On 02/11/2016 12:45 AM, Andrew Barnert via Python-Dev wrote:
On Feb 10, 2016, at 14:20, Georg Brandl <g.brandl@gmx.net> wrote:
First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly).
* Trailing underscores are not allowed, because they look confusing and don't contribute much to readability.
Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6" is just fine, or "123e__+456"? More to the point, if we really need an extra rule, and more complicated BNF, to outlaw this case, I don't think we want a liberal design at all.
Also, notice that Swift, Rust, and D all show examples with trailing underscores in their references, and they don't look particularly out of place with the other examples.
That's a point. I'll look into the implementation.
There appears to be no reason to restrict the use of underscores otherwise.
What other restrictions are there? I think the only place you've left that's not between digits is between the e and the sign.
There are other places left: * between 0x and the digits * between the digits and "j" * before and after the decimal point
A dead-simple rule like Swift's seems better than five separate rules that I have to learn and remember that make lexing more complicated and that ultimately amount to the conservative rule plus one other place I can put underscores where I'd never want to.
Not quite, see above.
**Group 1: liberal (like this PEP)**
* D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust [4]_ * Swift (although textual description says "between digits") [5]_
I don't think any of these are liberal like this PEP.
For example, Swift's actual grammar rule allows underscores anywhere but leading in the "digits" part of int literals and all three potential digit parts of float literals. That's the whole rule. It's more conservative than this PEP in not allowing them outside of digit parts (like between E and +), more liberal in allowing them to be trailing, but I'm pretty sure the reason behind the design wasn't specifically about how liberal or conservative they wanted to be, but about being as simple as possible. Rust's rule seems to be equivalent to Swift's, except that they forgot to define exponents anywhere. I don't think either of them was trying to be more liberal or more conservative; rather, they were both trying to be as simple as possible.
I actually modelled this PEP closely on Rust. It has restrictions as in this PEP, except that trailing underscores are allowed, and that "1.0e_+5" is not allowed (allowed by the PEP), and "1.0e+_5" is (not allowed by the PEP). I don't think you can argue that it's simpler. (If the PEP and our lexical reference were as loosely worded as Rust's, one could probably say it's "simple", too.) Also, both Swift and Rust don't have the baggage of allowing ".5" style literals, which makes the grammar simpler in Swift's case.
D does go out of its way to be as liberal as possible, e.g., allowing things like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit part, which can't have leading underscores), but it's also more conservative than this spec in not allowing underscores between e and the sign.
I think Perl is the only language that allows them anywhere but in the digits part.
Thanks for the feedback! Georg

On Wed, Feb 10, 2016 at 11:20:38PM +0100, Georg Brandl wrote:
This came up in python-ideas, and has met mostly positive comments, although the exact syntax rules are up for discussion.
Nicely done. But I would change the restrictions to a simpler version. Instead of five rules to learn:
The current proposal is to allow underscores anywhere in numeric literals, with these exceptions:
* Leading underscores cannot be allowed, since they already introduce identifiers. * Trailing underscores are not allowed, because they look confusing and don't contribute much to readability. * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up, because they are fixed strings and not logically part of the number. * No underscore allowed after a sign in an exponent (``1e-_5``), because underscores can also not be used after the signs in front of the number (``-1e5``). * No underscore allowed after a decimal point, because this leads to ambiguity with attribute access (the lexer cannot know that there is no number literal in ``foo._5``).
change to a single rule "one or more underscores may appear between two (hex)digits, but otherwise nowhere else". That's much simpler to understand than a series of restrictions as given above. That would be your second restrictive rule: "Multiple consecutive underscore allowed, but only between digits." That forbids leading and trailing underscores, underscores inside or immediately after the leading number base (since x, o and b aren't digits), and immediately before or after the sign, decimal point or e|E exponent symbol.
There appears to be no reason to restrict the use of underscores otherwise.
I don't like underscores immediately before the . or e|E in floats either: 123_.000_456 The dot is already visually distinctive enough, as is the e|E, and placing an underscore immediately before them doesn't aid in grouping the digits.
Instead of the liberal rule specified above, the use of underscores could be limited. Common rules are (see the "other languages" section):
* Only one consecutive underscore allowed, and only between digits. * Multiple consecutive underscore allowed, but only between digits.
I don't think there is any need to restrict it to only a single underscore. There are uses for more than one: Fraction(3__141_592_654, 1_000_000_000) hints that the 3 is somewhat special (for obvious reasons). -- Steve

On 02/10/2016 04:04 PM, Steven D'Aprano wrote:
change to a single rule "one or more underscores may appear between two (hex)digits, but otherwise nowhere else". That's much simpler to understand than a series of restrictions as given above.
I like the simpler rule, but I would also allow for an underscore between the base and the first digit: 0x_1ef9_ab22 is easier (at least, for me ;) to parse than 0x1ef9_ab22 However, since Georg is doing the work, I'm not going to argue too hard. -- ~Ethan~

The Mersenne Twister is no longer regarded as quite state-of-the art because it can get into states that produce long sequences that are not very random. There is a variation on MT called WELL that has better properties in this regard. Does anyone think it would be a good idea to replace MT with WELL as Python's default rng? https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear -- Greg

On Thu, Feb 11, 2016 at 01:08:41PM +1300, Greg Ewing wrote:
The Mersenne Twister is no longer regarded as quite state-of-the art because it can get into states that produce long sequences that are not very random.
There is a variation on MT called WELL that has better properties in this regard. Does anyone think it would be a good idea to replace MT with WELL as Python's default rng?
https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
I'm not able to judge the claims about which PRNG is better (perhaps Tim Peters has an opinion?) but if we do change, I'd like to see the existing random.Random moved to random.MT_Random for backwards compatibility and compatibility with other software which uses MT. Not necessarily saying that we have to keep it around forever (after all, we did dump the Wichmann-Hill PRNG some time ago) but we ought to keep it for at least a couple of releases. -- Steve

Steven D'Aprano writes:
Peters has an opinion?) but if we do change, I'd like to see the existing random.Random moved to random.MT_Random for backwards compatibility and compatibility with other software which uses MT. Not necessarily saying that we have to keep it around forever (after all, we did dump the Wichmann-Hill PRNG some time ago) but we ought to keep it for at least a couple of releases.
I think we should keep it around forever. Even my slowest colleagues are learning that they should record their seeds and PRNG algorithms for reproducibility's sake. :-) For that matter, restore Wichmann-Hill. Both should be clearly marked as "use only for reproducing previous bitstreams" (eg, in a package random.deprecated_generators).

On Thursday, February 11, 2016 7:20 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I think we should keep it around forever. Even my slowest colleagues are learning that they should record their seeds and PRNG algorithms for reproducibility's sake. :-)
+1
For that matter, restore Wichmann-Hill.
So you can write code that works on 2.3 and 3.6, but not 3.5? I agree that it shouldn't have gone away, but I think it may be too late for adding it back to help too much.
Both should be clearly marked as "use only for reproducing previous bitstreams" (eg, in a package random.deprecated_generators).
I like the random.deprecated_generators idea.

On Fri, Feb 12, 2016 at 3:12 PM, Andrew Barnert via Python-Dev <python-dev@python.org> wrote:
On Thursday, February 11, 2016 7:20 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I think we should keep it around forever. Even my slowest colleagues are learning that they should record their seeds and PRNG algorithms for reproducibility's sake. :-)
+1
For that matter, restore Wichmann-Hill.
So you can write code that works on 2.3 and 3.6, but not 3.5?
I agree that it shouldn't have gone away, but I think it may be too late for adding it back to help too much.
You're probably right, but the point isn't to make the same code run, necessarily. It's to make things verifiable. Suppose I do some scientific research that involves a pseudo-random number component, and I publish my results ("Monte Carlo analysis produced these results, blah blah, using this seed, etc, etc"). If you want to come back later and say "I think there was a bug in your code", you need to be able to generate the exact same PRNG sequence. I published my algorithm and my seed, so you should in theory be able to recreate that sequence; but if you have to reimplement the same algorithm, that's a lot of unnecessary work that could have been replaced with "from random.deprecated_generators import WichmannHill as Random". (Plus there's the whole question of "was your reimplemented PRNG buggy" - or, for that matter, "was the original PRNG buggy". Using the exact same code eliminates even that.) So I'm +1 on keeping Mersenne Twister even after it's been replaced as the default PRNG, -0 on reinstating something that hasn't been used in well over a decade, and -1 on replacing MT today - I'm not seeing strong arguments in favour of changing. ChrisA

On 2016-02-11 00:08, Greg Ewing wrote:
The Mersenne Twister is no longer regarded as quite state-of-the art because it can get into states that produce long sequences that are not very random.
There is a variation on MT called WELL that has better properties in this regard. Does anyone think it would be a good idea to replace MT with WELL as Python's default rng?
https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
There was a side-discussion about this during the secrets module proposal discussion. WELL would not be my first choice. It escapes the excess-0 islands faster than MT, but still suffers from them. More troubling to me is that it is a linear feedback shift register, like MT, and all LFSRs quickly fail the linear complexity test in BigCrush. xorshift* shares some of these flaws, but is significantly stronger and dominates WELL in most (all?) relevant dimensions. http://xorshift.di.unimi.it/ I'm favorable to the PCG family these days, though xorshift* and Random123 are reasonable alternatives. http://www.pcg-random.org/ https://www.deshawresearch.com/resources_random123.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

[Greg Ewing <greg.ewing@canterbury.ac.nz>]
The Mersenne Twister is no longer regarded as quite state-of-the art because it can get into states that produce long sequences that are not very random.
There is a variation on MT called WELL that has better properties in this regard. Does anyone think it would be a good idea to replace MT with WELL as Python's default rng?
I don't think so, because I've seen no groundswell of discontent about the Twister among Python users. Perhaps I'm missing some? Changes are disruptive and people argue about RNGs with religious zeal, so I favor making a change in this area only when it's compelling. It was compelling to move away from Wichmann-Hill when the Twister was introduced: WH was waaaaaay behind the state of the art at the time, its limitations were causing real problems, and there was near-universal adoption of the Twister around the world. The Twister was a game changer. When the time comes for a change, I'd be more inclined to (as Robert Kern already said) look at PCG and Random123. Like the Twister, WELL requires massive internal state, and fails the same kinds of randomnesss tests (while the suggested alternatives fail none to date). WELL does escape "zeroland" faster, but still much slower than PCG or Random123 (which appear to have no systematic attractors). The alternatives require much smaller state, and at least PCG much simpler code. Note that the seeding function used by Python doesn't take the user-supplied seed as-is (only __setstate__ does): it runs rounds of pseudo-random bit dispersion, to make it highly unlikely that an initial state with lots of zeroes is produced. While the Twister escapes zeroland very slowly, the flip side is that it also transitions _to_ zeroland very slowly. It's quite possible that nobody has ever fallen into such a state (short of contriving to via __setstate__). Falling into zeroland was a very real problem in the Twister's very early days, which is why its authors added the bit-dispersal code to the seeding function. Python was wise to wait until they did. It's prudent to wait for someone else to find the early surprises in PCG and Random123 too ;-)

On 2016-02-12 04:15, Tim Peters wrote:
[Greg Ewing <greg.ewing@canterbury.ac.nz>]
The Mersenne Twister is no longer regarded as quite state-of-the art because it can get into states that produce long sequences that are not very random.
There is a variation on MT called WELL that has better properties in this regard. Does anyone think it would be a good idea to replace MT with WELL as Python's default rng?
I don't think so, because I've seen no groundswell of discontent about the Twister among Python users. Perhaps I'm missing some?
Well me, but I'm mostly focused on numpy's PRNG, which is proceeding apace. https://github.com/bashtage/ng-numpy-randomstate While I am concerned about MT's BigCrush failures, what makes me most discontented is not having multiple guaranteed-independent streams.
It's prudent to wait for someone else to find the early surprises in PCG and Random123 too ;-)
Quite so! -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

I have occasionally wondered about this missing feature. On 10 February 2016 at 22:20, Georg Brandl <g.brandl@gmx.net> wrote:
Abstract and Rationale ======================
This PEP proposes to extend Python's syntax so that underscores can be used in integral and floating-point number literals.
This should extend complex or imaginary literals like 10_000j for consistency.
Specification =============
* Trailing underscores are not allowed, because they look confusing and don't contribute much to readability. * No underscore allowed after a sign in an exponent (``1e-_5``), because underscores can also not be used after the signs in front of the number (``-1e5``). [. . .]
The production list for integer literals would therefore look like this::
integer: decimalinteger | octinteger | hexinteger | bininteger decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] nonzerodigit: "1"..."9" decimalrest: (digit | "_")* digit digit: "0"..."9" octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" bindigit: "0" | "1"
For floating-point literals::
floatnumber: pointfloat | exponentfloat pointfloat: [intpart] fraction | intpart "." exponentfloat: (intpart | pointfloat) exponent intpart: digit (digit | "_")*
This allows trailing underscores such as 1_.2, 1.2_, 1.2_e-5. Your bullet point above suggests at least some of these are not desired.
fraction: "." intpart exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]
This allows underscores in the exponent (1e-5_0), contradicting the other bullet point.

On 02/11/2016 02:16 AM, Martin Panter wrote:
I have occasionally wondered about this missing feature.
On 10 February 2016 at 22:20, Georg Brandl <g.brandl@gmx.net> wrote:
Abstract and Rationale ======================
This PEP proposes to extend Python's syntax so that underscores can be used in integral and floating-point number literals.
This should extend complex or imaginary literals like 10_000j for consistency.
Yes, that was always the case, but I guess it should be explicit.
Specification =============
* Trailing underscores are not allowed, because they look confusing and don't contribute much to readability. * No underscore allowed after a sign in an exponent (``1e-_5``), because underscores can also not be used after the signs in front of the number (``-1e5``). [. . .]
The production list for integer literals would therefore look like this::
integer: decimalinteger | octinteger | hexinteger | bininteger decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"] nonzerodigit: "1"..."9" decimalrest: (digit | "_")* digit digit: "0"..."9" octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" bindigit: "0" | "1"
For floating-point literals::
floatnumber: pointfloat | exponentfloat pointfloat: [intpart] fraction | intpart "." exponentfloat: (intpart | pointfloat) exponent intpart: digit (digit | "_")*
This allows trailing underscores such as 1_.2, 1.2_, 1.2_e-5. Your bullet point above suggests at least some of these are not desired.
The middle one isn't, indeed. I updated the grammar accordingly.
fraction: "." intpart exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]
This allows underscores in the exponent (1e-5_0), contradicting the other bullet point.
I clarified the bullet points. An "immediately" was missing. Thanks for the feedback! Georg

On 11.02.16 00:20, Georg Brandl wrote:
**Group 1: liberal (like this PEP)**
* D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust [4]_ * Swift (although textual description says "between digits") [5]_
**Group 2: only between digits, multiple consecutive underscores**
* C# (open proposal for 7.0) [6]_ * Java [7]_
**Group 3: only between digits, only one underscore**
* Ada [8]_ * Julia (but not in the exponent part of floats) [9]_ * Ruby (docs say "anywhere", in reality only between digits) [10]_
C++ is in this group too. The documentation of Perl explicitly says that Perl is in this group too (23__500 is not legal). Perhaps there is a bug in Perl implementation. And may be Swift is intended to be in this group. I think we should follow the majority of languages and use simple rule: "only between digits". I have provided an implementation.

On 02/11/2016 11:17 AM, Serhiy Storchaka wrote:
**Group 3: only between digits, only one underscore**
* Ada [8]_ * Julia (but not in the exponent part of floats) [9]_ * Ruby (docs say "anywhere", in reality only between digits) [10]_
C++ is in this group too.
The documentation of Perl explicitly says that Perl is in this group too (23__500 is not legal). Perhaps there is a bug in Perl implementation. And may be Swift is intended to be in this group.
I think we should follow the majority of languages and use simple rule: "only between digits".
I have provided an implementation.
Thanks for the alternate patch. I used the two-function approach you took in ast.c for my latest revision. I still think that some cases (like two of the examples in the PEP, 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed rule is preferable. cheers, Georg

On 11.02.16 14:14, Georg Brandl wrote:
On 02/11/2016 11:17 AM, Serhiy Storchaka wrote:
**Group 3: only between digits, only one underscore**
* Ada [8]_ * Julia (but not in the exponent part of floats) [9]_ * Ruby (docs say "anywhere", in reality only between digits) [10]_
C++ is in this group too.
The documentation of Perl explicitly says that Perl is in this group too (23__500 is not legal). Perhaps there is a bug in Perl implementation. And may be Swift is intended to be in this group.
I think we should follow the majority of languages and use simple rule: "only between digits".
I have provided an implementation.
Thanks for the alternate patch. I used the two-function approach you took in ast.c for my latest revision.
I still think that some cases (like two of the examples in the PEP, 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed rule is preferable.
Should I write an alternative PEP for strong rule?

On 02/11/2016 06:19 PM, Serhiy Storchaka wrote:
Thanks for the alternate patch. I used the two-function approach you took in ast.c for my latest revision.
I still think that some cases (like two of the examples in the PEP, 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed rule is preferable.
Should I write an alternative PEP for strong rule?
That seems excessive for a minor point. Let's collect feedback for a few days, and we can also collect some informal votes. In the end, I suspect that Guido will let us know about his preference for one of the possibilities, and when he does, I will update the PEP accordingly. cheers, Georg

On 11.02.16 19:40, Georg Brandl wrote:
On 02/11/2016 06:19 PM, Serhiy Storchaka wrote:
Thanks for the alternate patch. I used the two-function approach you took in ast.c for my latest revision.
I still think that some cases (like two of the examples in the PEP, 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed rule is preferable.
Should I write an alternative PEP for strong rule?
That seems excessive for a minor point. Let's collect feedback for a few days, and we can also collect some informal votes.
I suspect that my arguments can be lost otherwise.

Serhiy Storchaka writes:
I suspect that my arguments can be lost [without a competing PEP].
Send Georg a patch for his PEP, that's where they belong, since only one of the two PEPs could be approved, and they would be 95% the same otherwise. If he doesn't apply it (he's allowed to move it to the "rejected arguments" section, though), or the decision silently goes against you, speak up then -- that would be a problem IMO. Or you could offer to BD1P! (If you're selected, I hope you change your mind! :-)

On 02/11/2016 09:19 AM, Serhiy Storchaka wrote:
On 11.02.16 14:14, Georg Brandl wrote:
I still think that some cases (like two of the examples in the PEP, 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed rule is preferable.
Should I write an alternative PEP for strong rule?
Please don't. A style guide recommendation which allows for variations when necessary is much better -- consenting adults, remember? -- ~Ethan~
participants (22)
-
Andrew Barnert
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
David Mertz
-
Ethan Furman
-
Georg Brandl
-
Glenn Linderman
-
Greg Ewing
-
Jeff Hardy
-
Martin Panter
-
MRAB
-
Nick Coghlan
-
Paul Moore
-
Petr Viktorin
-
Robert Kern
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Tim Peters
-
Victor Stinner