[Python-Dev] Octal literals

Nick Coghlan ncoghlan at gmail.com
Fri Feb 3 11:07:12 CET 2006

Bengt Richter wrote:
> On Fri, 3 Feb 2006 10:16:17 +1100, "Delaney, Timothy (Tim)" <tdelaney at avaya.com> wrote:
>> Andrew Koenig wrote:
>>>> I definately agree with the 0c664 octal literal. Seems rather more
>>>> intuitive.
>>> I still prefer 8r664.
>> The more I look at this, the worse it gets. Something beginning with
>> zero (like 0xFF, 0c664) immediately stands out as "unusual". Something
>> beginning with any other digit doesn't. This just looks like noise to
>> me.
>> I found the suffix version even worse, but they're blown out of the
>> water anyway by the fact that FFr16 is a valid identifier.
> Are you sure you aren't just used to the x in 0xff? I.e., if the leading
> 0 were just an alias for 16, we could use 8x664 instead of 8r664.

No, I'm with Tim - it's definitely the distinctive shape of the '0' that helps 
the non-standard base stand out. '0c' creates a similar shape, also helping it 
to stand out. More on distinctive shapes below, though.

That said, I'm still trying to figure out exactly what problem is being solved 
here. Thinking out loud. . .

The full syntax for writing integers in any base is:

   int("LITERAL", RADIX)
   int("LITERAL", base=RADIX)

5 prefix chars, 3 or 8 in the middle (counting the space, and depending on 
whether the keyword is used or not), one on the end, and one or two to specify 
the radix. That's quite verbose, so its unsurprising that many would like 
something nicer in the toolkit when they need to write multiple numeric 
literals in a base other than ten. This can typically happen when writing Unix 
system admin scripts, bitbashing to control a piece of hardware or some other 
low-level task.

The genuine use cases we have for integer literals are:
   - decimal (normal numbers)
   - hex (compact bitmasks)
   - octal (unix file permissions)
   - binary (explicit bitmasks for those that don't speak fluent hex)

Currently, there is no syntax for binary literals, and the syntax for octal 
literals is both magical (where else in integer mathematics does a leading 
zero matter?) and somewhat error prone (int and eval will give different 
answers for a numeric literal with a leading zero - int ignores the leading 
zero, eval treats it as signifying that the value is in octal. The charming 
result is that the following statement fails: assert int('0123') == 0123).

Looking at existing precedent in the language, a prefix is currently used when 
the parsing of the subsequent literal may be affected (that is, the elements 
that make up the literal may be interpreted differently depending on the 
prefix). This is the case for hex and octal literals, and also for raw and 
unicode strings.

Suffixes are currently used when the literal as a whole is affected, but the 
meaning of the individual elements remains the same. This is the case for both 
long integer and imaginary number literals. A suffix also makes sense for 
decimal float literals, as the individual elements would still be interpreted 
as base 10 digits.

So, since we want to affect the parsing process, this means we want a prefix. 
The convention of using '0x' to denote hex extends far beyond Python, and 
doesn't seem to provoke much in the way of objection.

This suggests options like '0o' or '0c' for octal literals. Given that '0x' 
matches the '%x' in string formatting, the least magical option would be '0o' 
(to match the existing '%o' output format). While '0c' is cute and quite 
suggestive, it creates a significant potential for confusion , as it most 
emphatically does *not* align with the meaning of the '%c' format specifier.

I'd be +0 on changing the octal literal prefix from '0' to '0o', and also +0 
on adding an '0b' prefix and '%b' format specifier for binary numbers.

Whether anyone will actually care enough to implement a patch to change the 
syntax for any of these is an entirely different question ;)


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list