[Python-3000] Draft PEP for Requiring Lowercase Literal Modifiers

Andrew Karem McCollum mccollum at fas.harvard.edu
Mon Mar 19 16:39:57 CET 2007


This is my first PEP, and one of my first postings to this list, so I 
apologize in advance for any glaring errors.  I wrote this up because I 
feel like it is a good companion to the recent octal and binary 
discussions/PEP.  If nothing else, this should at least provide a jumping 
off point for discussion and someone more experienced could use it as a 
basis for a more rigorous PEP if they so desired.  If it is supported, I 
am happy to work on an implementation, though I imagine someone else could 
produce one much more expediently.

 	-Andrew McCollum


------------------------------------------------------------------- 
PEP: XXX
Title: Requiring Lowercase Characters in Literal Modifiers
Version: $Revision$
Last-Modified: $Date$
Author: Andrew McCollum <mccollum at fas.harvard.edu>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Mar-2007
Python-Version: 3.0
Post-History:

Abstract
========

This PEP proposes to change the syntax for declaring literals with prefixes or 
modifiers by requiring the prefix or modifier to be in lowercase. Affected 
modifiers include:

     * The 'b' and 'r' prefixes for string and bytes literals
     * The 'b', 'o', and 'x' modifiers for integer literals in other bases
     * The 'j' suffix for imaginary literals
     * The 'e' exponent notation for floating point literals


Motivation
==========

The primary motivation for this change is to avoid confusion and improve the 
appearance of these literals. The most obvious example is the proposed 'o' 
modifier for declaring an octal number such as, '0o755', representing the 
decimal value '493'. When the 'o' character is written in uppercase, the number 
appears as '0O755', which can be confusing in many fonts because it resembles 
the string of digits '00755'.

In other cases, the lowercase version of the modifier serves to visually 
separate the two components of the literal, or to make the modifier stand out 
against the neighboring literal, such as in the following examples::

     0x5A            0b0110          0xab
     1.92e21         3.13j           3.14e5j
     r'\d+\.'        b"Hello world"

With uppercase modifiers, these literals appear as::

     0X5A            0B0110          0Xab
     1.92E21         3.13J           3.14E5J
     R'\d+\.'        B"Hello world"

Which are more difficult to visually parse, especially upon initial inspection.

There is also an argument for uniformity and so that TOOWTDI. Unlike the case 
of string literals where ', ", and """ all behave differently making each 
useful in different situations, in the case of literal modifiers, the 
difference is purely cosmetic and the behavior of the literal is otherwise 
unchanged.


Grammar Changes
===============

The new grammar for string prefixes [1]_ (with the bytes literal) will be:

     stringprefix ::= "b" | "r" | "br"

Integer literals [2]_ will have the following grammar changes:

     bininteger ::= "0b" bindigit+
     octinteger ::= "0o" octdigit+
     hexinteger ::= "0x" hexdigit+

Exponents in floating point literals [3]_ will now be defined by:

     exponent ::= "e" ["+" | "-"] digit+

Imaginary numbers [4]_ will be will now be defined with the syntax:

     imagnumber ::= (floatnumber | intpart) "j"

The grammar of these literals will be otherwise unchanged. For example, the 
specification of 'hexdigit' will continue to allow both uppercase and lowercase 
digits.

Since this PEP is targeted at Python 3000, the suffix for specifying long 
integer literals ('l' or 'L') and the prefix for specifying unicode strings 
('u' or 'U') are ignored as both forms will disappear as these types are merged 
with int and str, respectively.


Semantic Changes
================

The behavior of the 'int' builtin when passed a radix of 0 will be changed to 
follow the above grammar. This change is to maintain the specified behavior 
[5]_ that a radix of 0 mirrors the literal syntax. The behavior of this 
function will otherwise not be altered. In particular, the behavior of 
accepting the prefix '0X' when a radix of 16 is specified will be kept for 
backwards compatibility and easier parsing of data files.


Automatic Conversion
====================

It should be trivial for the '2to3' conversion tool to convert literals to the 
new syntax in all cases.  The only possible incompatibility will be from the 
subtle changes to the 'int' builtin.


Open Issues
===========

The main issue involves the treatment of hexadecimal values employing the 
legacy '0X' prefix when passed to the 'int' builtin. Several people showed a 
desire to maintain parity with the literal syntax and the 'eval' function when 
0 was passed in as the radix. The argument against this behavior is that it 
breaks backwards compatibility and makes parsing integers from arbitrary 
sources more difficult. This PEP makes the compromise of allowing the use of 
the prefix '0X' only when the radix is explicitly specified. The rationale for 
this choice is that when parsing integers from data files, the radix is often 
know ahead of time, and thus can be supplied as a second argument to maintain 
the previous behavior, while maintaining the symmetry between the literal 
syntax and the 0 radix form.


BDFL Pronouncements
===================

The BDFL supports the disallowing of leading zeros in the syntax for integer 
literals, and was leaning towards maintaining this behavior when a radix of 0 
was passed to the 'int' builtin [6]_. This would break backwards compatibility 
for automatically parsing octal literals.

Later, the BDFL expressed a preference that '0X' be an allowable prefix for 
hexadecimal numbers when a radix of 0 was passed to the 'int' builtin [7]_. The 
PEP currently only allows this prefix when the radix is explicitly specified.


Reference Implementation
========================

A reference implementation is not yet provided, but since no additional 
behavior is proposed, simply the removal of previously allowed behavior, 
changes should be minimal.


References
==========

.. [1] http://www.python.org/doc/current/ref/strings.html

.. [2] http://www.python.org/doc/current/ref/integers.html

.. [3] http://www.python.org/doc/current/ref/floating.html

.. [4] http://www.python.org/doc/current/ref/imaginary.html

.. [5] http://docs.python.org/lib/built-in-funcs.html

.. [6] http://mail.python.org/pipermail/python-3000/2007-March/006325.html

.. [7] http://mail.python.org/pipermail/python-3000/2007-March/006423.html


Copyright
=========

This document has been placed in the public domain.




..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:



More information about the Python-3000 mailing list