Annoying octal notation

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Aug 23 02:13:31 EDT 2009


On Sat, 22 Aug 2009 22:19:01 -0500, Derek Martin wrote:

> On Sat, Aug 22, 2009 at 02:55:51AM +0000, Steven D'Aprano wrote:
>> > I can see how 012 can
>> > be confusing to new programmers, but at least it's legible, and the
>> > great thing about humans is that they can be taught (usually).
>> 
>> And the great thing is that now you get to teach yourself to stop
>> writing octal numbers implicitly and be write them explicitly with a
>> leading 0o instead :)
> 
> Sorry, I don't write them implicitly.  A leading zero explicitly states
> that the numeric constant that follows is octal.

That is incorrect.

Decimal numbers implicitly use base 10, because there's nothing in the 
literal 12340 (say) to indicate the base is ten, rather than 16 or 9 or 
23. Although implicit is usually bad, when it's as common and expected as 
decimal notation, it's acceptable.

Hex decimals explicitly use base 16, because the leading 0x is defined to 
mean "base 16". 0x is otherwise not a legal decimal number, or hex number 
for that matter. (It would be legal in base 34 or greater, but that's 
rare enough that we can ignore this.) For the bases we care about, a 
leading 0x can't have any other meaning -- there's no ambiguity, so we 
can treat it as a synonym for "base 16".

(Explicitness isn't a binary state, and it would be even more explicit if 
the base was stated in full, as in e.g. Ada where 16#FF# = decimal 255.)

However, octal numbers are defined implicitly: 012 is a legal base 10 
number, or base 3, or base 9, or base 16. There's nothing about a leading 
zero that says "base 8" apart from familiarity. We can see the difference 
between leading 0x and leading 0 if you repeat it: repeating an explicit 
0x, as in 0x0xFF, is a syntax error, while repeating an implicit 0 
silently does nothing different:

>>> 0x0xFF
  File "<stdin>", line 1
    0x0xFF
         ^
SyntaxError: invalid syntax
>>> 0077
63


> It is so in 6 out of 7
> computer languages I have more than a passing familiarity with (the 7th
> being scheme, which is a thing unto itself), including Python.  It's
> that way on Bourne-compatible and POSIX-compatible Unix shells (though
> it requires a leading backslash before the leading zero there).  I'm
> quite certain it can not be the case on only those 6 languages that I
> happen to be familiar with...

No, of course not. There are a bunch of languages, pretty much all 
heavily influenced by C, which treat integer literals with leading 0s as 
oct: C++, Javascript, Python 2.x, Ruby, Perl, Java. As so often is the 
case, C's design mistakes become common practice. Sigh.

However, there are many, many languages that don't, or otherwise do 
things differently to C. Even some modern C-derived languages reject the 
convention:

C# doesn't have octal literals at all.

As far as I can tell, Objective-C and Cocoa requires you to explicitly 
enable support for octal literals before you use them.

In D, at least some people want to follow Python's lead and either drop 
support for oct literals completely, or require a 0o prefix:
http://d.puremagic.com/issues/show_bug.cgi?id=2656

E makes a leading 0 a syntax error.


As far as other, non-C languages go, leading 0 = octal seems to be rare 
or non-existent:

Basic and VB use a leading &O for octal.

FORTRAN 90 uses a leading O (uppercase o) for octal, and surrounds the 
literal in quotation marks: O"12" would be ten in octal. 012 would be 
decimal 12.

As far as I can tell, COBOL also ignores leading zeroes.

Forth interprets literals according to the current value of BASE (which 
defaults to 10). There's no special syntax for it.To enter ten in octal, 
you might say:

8 BASE ! 12

or if your system provides it:

OCT 12

Standard Pascal ignores leading 0s in integers, and doesn't support octal 
at all. A leading $ is used for hex. At least one non-standard Pascal 
uses leading zero for octal.

Haskell requires an explicit 0o:
http://www.haskell.org/onlinereport/lexemes.html#lexemes-numeric

So does OCaml.

Ada uses decimal unless you explicitly give the base:
http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html

Leading zeroes are insignificant in bc:

[steve at sylar ~]$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
012 + 011
23

Leading zeroes are also insignificant in Hewlett-Packard RPN language 
(e.g. HP-48GX calculators), Hypertalk and languages derived from it.

I'm not sure, but it looks to me like Boo doesn't support octal literals, 
although it supports hex with 0x and binary with 0b.

Algol uses an explicit base: 8r12 to indicate octal 10.

Common Lisp and Scheme use a #o prefix.

As far as *languages* go, 0-based octal literals are in the tiny 
minority. As far as *programmers* go, it may be in a plurality, perhaps 
even a small minority, but remember there are still millions of VB 
programmers out there who are just as unfamiliar with C conventions.

> While it may be true that people commonly write decimal numbers with
> leading zeros (I dispute even this
[...]

Leading zeroes in decimal numbers are *very* common in dates and times.


[...]
> Given that Python has an ncurses interface, I'm
> guessing it's used there too.  In fact if the Python source had no octal
> in it, I would find that very surprising.

I can't see any oct literals in the standard library, not even in the 
ncurses interface, but then my grep-foo is weak and I may have made a 
mistake. I encourage you to look for yourself.


>> It's no hardship to write 0o12 instead of 012.
> 
> Computer languages are not write-only, excepting maybe Perl. ;-) Writing
> 0o12 presents no hardship; but I assert, with at least some support from
> others here, that *reading* it does.

No more so than 0x or 0b literals. If anything, 0o12 stands out as "not 
twelve" far more than 012 does.



-- 
Steven



More information about the Python-list mailing list