[Python-Dev] PEP 515: Underscores in Numeric Literals

Georg Brandl g.brandl at gmx.net
Wed Feb 10 17:20:38 EST 2016


This came up in python-ideas, and has met mostly positive comments,
although the exact syntax rules are up for discussion.

cheers,
Georg

--------------------------------------------------------------------------------

PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6

Abstract and Rationale
======================

This PEP proposes to extend Python's syntax so that underscores can be used in
integral and floating-point number literals.

This is a common feature of other modern languages, and can aid readability of
long literals, or literals whose value should clearly separate into parts, such
as bytes or words in hexadecimal notation.

Examples::

    # grouping decimal numbers by thousands
    amount = 10_000_000.0

    # grouping hexadecimal addresses by words
    addr = 0xDEAD_BEEF

    # grouping bits into bytes in a binary literal
    flags = 0b_0011_1111_0100_1110


Specification
=============

The current proposal is to allow underscores anywhere in numeric literals, with
these exceptions:

* Leading underscores cannot be allowed, since they already introduce
  identifiers.
* Trailing underscores are not allowed, because they look confusing and don't
  contribute much to readability.
* The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up,
  because they are fixed strings and not logically part of the number.
* No underscore allowed after a sign in an exponent (``1e-_5``), because
  underscores can also not be used after the signs in front of the number
  (``-1e5``).
* No underscore allowed after a decimal point, because this leads to ambiguity
  with attribute access (the lexer cannot know that there is no number literal
  in ``foo._5``).

There appears to be no reason to restrict the use of underscores otherwise.

The production list for integer literals would therefore look like this::

   integer: decimalinteger | octinteger | hexinteger | bininteger
   decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"]
   nonzerodigit: "1"..."9"
   decimalrest: (digit | "_")* digit
   digit: "0"..."9"
   octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit
   hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit
   bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit
   octdigit: "0"..."7"
   hexdigit: digit | "a"..."f" | "A"..."F"
   bindigit: "0" | "1"

For floating-point literals::

   floatnumber: pointfloat | exponentfloat
   pointfloat: [intpart] fraction | intpart "."
   exponentfloat: (intpart | pointfloat) exponent
   intpart: digit (digit | "_")*
   fraction: "." intpart
   exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]


Alternative Syntax
==================

Underscore Placement Rules
--------------------------

Instead of the liberal rule specified above, the use of underscores could be
limited.  Common rules are (see the "other languages" section):

* Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscore allowed, but only between digits.

Different Separators
--------------------

A proposed alternate syntax was to use whitespace for grouping.  Although
strings are a precedent for combining adjoining literals, the behavior can lead
to unexpected effects which are not possible with underscores.  Also, no other
language is known to use this rule, except for languages that generally
disregard any whitespace.

C++14 introduces apostrophes for grouping, which is not considered due to the
conflict with Python's string literals. [1]_


Behavior in Other Languages
===========================

Those languages that do allow underscore grouping implement a large variety of
rules for allowed placement of underscores.  This is a listing placing the known
rules into three major groups.  In cases where the language spec contradicts the
actual behavior, the actual behavior is listed.

**Group 1: liberal (like this PEP)**

* D [2]_
* Perl 5 (although docs say it's more restricted) [3]_
* Rust [4]_
* Swift (although textual description says "between digits") [5]_

**Group 2: only between digits, multiple consecutive underscores**

* C# (open proposal for 7.0) [6]_
* Java [7]_

**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_


Implementation
==============

A preliminary patch that implements the specification given above has been
posted to the issue tracker. [11]_


References
==========

.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

.. [2] http://dlang.org/spec/lex.html#integerliteral

.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors

.. [4] http://doc.rust-lang.org/reference.html#number-literals

.. [5]
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html

.. [6] https://github.com/dotnet/roslyn/issues/216

.. [7]
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html

.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4

.. [9]
http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/

.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers

.. [11] http://bugs.python.org/issue26331


Copyright
=========

This document has been placed in the public domain.




More information about the Python-Dev mailing list