Disallow "00000" as a synonym for "0"

As per this question: http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-... It seems like Python accepts "000000000" to mean "0". Whatever the historical reason, should this be deprecated?

On Fri, Jul 17, 2015 at 10:28:19AM -0400, Eric V. Smith wrote:
I wonder what working code uses 00 when 0 is wanted? Do you have any examples? I believe that anyone writing 00 is more likely to have made a typo than to actually intend to get 0. In Python 2, 00 has an obvious and correct interpretation: it is zero in octal. But in Python 3, octal is written with the prefix 0o not 0. py> 0o10 8 py> 010 File "<stdin>", line 1 010 ^ SyntaxError: invalid token (The 0o prefix also works in Python 2.7.) In Python 3, 00 has no sensible meaning. It's not octal, binary or hex, and it shouldn't be decimal. Decimal integers are explicitly prohibited from beginning with a leading zero: https://docs.python.org/3/reference/lexical_analysis.html#integers so the mystery is why *zero* is a special case permitted to have leading zeroes. The lexical definition of "decimal integer" is: decimalinteger ::= nonzerodigit digit* | "0"+ Why was it defined that way? The more obvious: decimalinteger ::= nonzerodigit digit* | "0" was the definition in Python 2. As the Stackoverflow post above points out, the definition of decimalinteger actually in use seems to violate PEP 3127, and supporting "0"+ was added as a special case by Georg Brandl. Since leading 0 digits in decimal int literals are prohibited, we cannot write 0001, 0023 etc. Why would we write 0000 to get zero? Unless somebody can give a good explanation for why leading zeroes are permitted for zero, I think it was a mistake to allow them, and is an ugly wart on the language. I think that should be deprecated and eventually removed. Since it only affects int literals, any deprecation warning will occur at compile-time, so it shouldn't have any impact on runtime performance. -- Steve

On 07/18/2015 11:17 PM, Steven D'Aprano wrote:
The parsing rules for floats are not the same as int,
Umm.. yes. But are they intentional different?
Me either. It seems to me int's should be relaxed to allow for leading zero's instead. The most common use of leading zero's is when numbers in strings are used and those numbers are sorted. Without the leading zero's the sort order may not be in numerical order. The int class supports using leading zero's in strings, but not in literal form. (Although it may use a C string to int function when parsing it.)
int(0000) 0
int(007.0) 7
float("0001") 1.0
It's common to cut and paste numerical data from text data. If you are cutting numerical data from a table, then you may also need to remove all the leading zeros. A macro can do that, but it complicates a simple cut and paste operation. Note that this does not effect the internal representation of ints, only how python interprets the string literal during the compiling process. A social reason for this limitation is that a number of other languages do use a leading digit 0 to define octal numbers. And this helps catch silent errors due to copying numbers directly in that case. (Python uses "0o" and not just "0".) In python, it just catches a possible silent error, when a 0onnn is mistyped as 00nnn. So this looks like it's a quick BDFL judgement call to me. (or other core developer number specialist call.) I think the status quo wins by default otherwise. Cheers, Ron

On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:
Yes, because int() can take an extra parameter to specify the base. Source code can't.
Python 2 also supported C-style "0777" to mean 511, and it's all through Unix software (eg "0777" is a common notation for a world-writable directory's permission bits; since there are three bits (r/w/x) for each of the three categories (u/g/o), it makes sense to use octal). Supporting "0777" in source code and having it mean 777 decimal would be extremely confusing, which is why it's a straight-up error in Python 3. ChrisA

On 20 Jul 2015 2:38 am, "Chris Angelico" <rosuav@gmail.com> wrote:
On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:
A social reason for this limitation is that a number of other languages
do
Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals, since we can't know if they're supposed to be octal values or not, or if the "b" or "x" was left out of a binary or hex literal that has been padded out to a fixed number of bits, or if the decimal point was left out of a floating point literal. The one integer literal case that *could* be reliably kept consistent with the general mathematical notation of "leading zeroes are permitted, but have no significance" was zero itself, which is what was done. Cheers, Nick.

On Mon, Jul 20, 2015 at 10:38:02AM +1000, Nick Coghlan wrote:
Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals,
That's a ... different ... definition of "special case". You're saying that, out of the literally infinite number of possible ints one might attempt to write, *all of them except zero* are the special case, while zero itself is the non-special case. O-kay. I get the argument that allowing people to write 000000000 when they want a int zero is harmless, and the code churn required to prevent that outweighs any benefit gained. The task I created on the tracker has been closed as a "won't fix" or "rejected" (I forget which one), and I'm not going to argue with that. But I did find your perspective above funny enough that I had to reply. But for the record, and this will be my last word on the subject:
Right. So if you see somebody has written 00 as a literal in Python 3, it's *more likely to be an error than deliberate*. E.g. we can align octal or hex numbers using a fixed width string with leading zeroes where needed: nums = [0x00001234, 0x000000A3, 0x00000005, 0x000027F3, 0x00000000, # this is okay 00000000, # breaks the alignment 0000000000, # aligned but where's the X? 0x0000C21D, ] but if you see a plain, unprefixed 0 mixed in there, that smells of an possible error. Even if it isn't an error, to me it hints of an error enough that I'd want to question the code author's intention. "Did you really mean zero, or is that a typo?" Regardless of everything else, to me this is an aesthetic question. Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart. But until and unless somebody actually gets bitten by this in real code, and a typo hides in plain view disguised as zero, I can't honestly say it is outrightly harmful. -- Steve

On Sun, Jul 19, 2015 at 10:01 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart.
I agree in general, but there is one case where I am on the fence: dates = [ date(2005, 07, 01), date(2005, 11, 15), ..] looks marginally better than the valid alternative. I often see this form written by 2.7 users, and it requires a medium size lecture to explain why they should not write code like this even if it seemingly works.

On Mon, Jul 20, 2015 at 12:14 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
The lecture should be fairly simple. Just put 08 in there and you'll see why you shouldn't do it. :) If they then ask "Why is 07 allowed but 08 not?", then you can go into detail about octal, why C uses 0755 to mean 493 (and why it makes a LOT more sense to key some things in using octal - nobody would understand what permissions 493 would mean), and then it becomes pretty clear that there are two viable interpretations for "0755". ChrisA

On 20 July 2015 at 12:01, Steven D'Aprano <steve@pearwood.info> wrote:
The integer literal zero only looks like the special case if you first exclude all the *other* ways of entering numeric values into Python applications and only consider "integer literal zero" and "non-zero integer literals". The fact that "0", "0.0" and "0j" produce values of different types is an implementation detail of Python's representation of abstract mathematical concepts, so the general case is actually "you can write numbers in Python the same way you write them in mathematical notation, including with leading zeroes if you want" (with the caveat that we use the electrical engineering "j" for imaginary numbers, rather than the more confusable mathematical "i") The Python 2 special case is then "unlike mathematical notation, using leading zeroes on an integer literal will result in it being interpreted in base 8 rather than base 10". The corresponding Python 3 special case is "unlike mathematical notation, you can't use leading zeroes on non-zero integers in Python 3, because of historical reasons relating to the syntax previously used for octal numbers in Python 2, and still used in C/C++, and various other languages". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Jul 18, 2015 at 01:35:49PM -0400, random832@fastmail.us wrote:
Do you mean up to 007, since they are the same in oct and dec? In either case, that introduces even more special cases. Whether it is one special case, 000, or eight, or ten, 000 through 009, the questions remain: - why does (let's say) `n = 02` work, but `n = 012` fail? - why would you intentionally write `n = 00' when you could simply write `n = 0`? Backwards compatibilty aside, I don't think there's any reason to keep this feature. I suspect that it can only mask typos, not be used for any sensible reason. Back when 000 meant octal zero, it might have made sense to write it that way to align with a bunch of other octal numbers, but now you would surely write 0o00 to align with 0o10. http://bugs.python.org/issue24668 -- Steve

On Fri, Jul 17, 2015 at 10:28:19AM -0400, Eric V. Smith wrote:
I wonder what working code uses 00 when 0 is wanted? Do you have any examples? I believe that anyone writing 00 is more likely to have made a typo than to actually intend to get 0. In Python 2, 00 has an obvious and correct interpretation: it is zero in octal. But in Python 3, octal is written with the prefix 0o not 0. py> 0o10 8 py> 010 File "<stdin>", line 1 010 ^ SyntaxError: invalid token (The 0o prefix also works in Python 2.7.) In Python 3, 00 has no sensible meaning. It's not octal, binary or hex, and it shouldn't be decimal. Decimal integers are explicitly prohibited from beginning with a leading zero: https://docs.python.org/3/reference/lexical_analysis.html#integers so the mystery is why *zero* is a special case permitted to have leading zeroes. The lexical definition of "decimal integer" is: decimalinteger ::= nonzerodigit digit* | "0"+ Why was it defined that way? The more obvious: decimalinteger ::= nonzerodigit digit* | "0" was the definition in Python 2. As the Stackoverflow post above points out, the definition of decimalinteger actually in use seems to violate PEP 3127, and supporting "0"+ was added as a special case by Georg Brandl. Since leading 0 digits in decimal int literals are prohibited, we cannot write 0001, 0023 etc. Why would we write 0000 to get zero? Unless somebody can give a good explanation for why leading zeroes are permitted for zero, I think it was a mistake to allow them, and is an ugly wart on the language. I think that should be deprecated and eventually removed. Since it only affects int literals, any deprecation warning will occur at compile-time, so it shouldn't have any impact on runtime performance. -- Steve

On 07/18/2015 11:17 PM, Steven D'Aprano wrote:
The parsing rules for floats are not the same as int,
Umm.. yes. But are they intentional different?
Me either. It seems to me int's should be relaxed to allow for leading zero's instead. The most common use of leading zero's is when numbers in strings are used and those numbers are sorted. Without the leading zero's the sort order may not be in numerical order. The int class supports using leading zero's in strings, but not in literal form. (Although it may use a C string to int function when parsing it.)
int(0000) 0
int(007.0) 7
float("0001") 1.0
It's common to cut and paste numerical data from text data. If you are cutting numerical data from a table, then you may also need to remove all the leading zeros. A macro can do that, but it complicates a simple cut and paste operation. Note that this does not effect the internal representation of ints, only how python interprets the string literal during the compiling process. A social reason for this limitation is that a number of other languages do use a leading digit 0 to define octal numbers. And this helps catch silent errors due to copying numbers directly in that case. (Python uses "0o" and not just "0".) In python, it just catches a possible silent error, when a 0onnn is mistyped as 00nnn. So this looks like it's a quick BDFL judgement call to me. (or other core developer number specialist call.) I think the status quo wins by default otherwise. Cheers, Ron

On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:
Yes, because int() can take an extra parameter to specify the base. Source code can't.
Python 2 also supported C-style "0777" to mean 511, and it's all through Unix software (eg "0777" is a common notation for a world-writable directory's permission bits; since there are three bits (r/w/x) for each of the three categories (u/g/o), it makes sense to use octal). Supporting "0777" in source code and having it mean 777 decimal would be extremely confusing, which is why it's a straight-up error in Python 3. ChrisA

On 20 Jul 2015 2:38 am, "Chris Angelico" <rosuav@gmail.com> wrote:
On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:
A social reason for this limitation is that a number of other languages
do
Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals, since we can't know if they're supposed to be octal values or not, or if the "b" or "x" was left out of a binary or hex literal that has been padded out to a fixed number of bits, or if the decimal point was left out of a floating point literal. The one integer literal case that *could* be reliably kept consistent with the general mathematical notation of "leading zeroes are permitted, but have no significance" was zero itself, which is what was done. Cheers, Nick.

On Mon, Jul 20, 2015 at 10:38:02AM +1000, Nick Coghlan wrote:
Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals,
That's a ... different ... definition of "special case". You're saying that, out of the literally infinite number of possible ints one might attempt to write, *all of them except zero* are the special case, while zero itself is the non-special case. O-kay. I get the argument that allowing people to write 000000000 when they want a int zero is harmless, and the code churn required to prevent that outweighs any benefit gained. The task I created on the tracker has been closed as a "won't fix" or "rejected" (I forget which one), and I'm not going to argue with that. But I did find your perspective above funny enough that I had to reply. But for the record, and this will be my last word on the subject:
Right. So if you see somebody has written 00 as a literal in Python 3, it's *more likely to be an error than deliberate*. E.g. we can align octal or hex numbers using a fixed width string with leading zeroes where needed: nums = [0x00001234, 0x000000A3, 0x00000005, 0x000027F3, 0x00000000, # this is okay 00000000, # breaks the alignment 0000000000, # aligned but where's the X? 0x0000C21D, ] but if you see a plain, unprefixed 0 mixed in there, that smells of an possible error. Even if it isn't an error, to me it hints of an error enough that I'd want to question the code author's intention. "Did you really mean zero, or is that a typo?" Regardless of everything else, to me this is an aesthetic question. Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart. But until and unless somebody actually gets bitten by this in real code, and a typo hides in plain view disguised as zero, I can't honestly say it is outrightly harmful. -- Steve

On Sun, Jul 19, 2015 at 10:01 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart.
I agree in general, but there is one case where I am on the fence: dates = [ date(2005, 07, 01), date(2005, 11, 15), ..] looks marginally better than the valid alternative. I often see this form written by 2.7 users, and it requires a medium size lecture to explain why they should not write code like this even if it seemingly works.

On Mon, Jul 20, 2015 at 12:14 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
The lecture should be fairly simple. Just put 08 in there and you'll see why you shouldn't do it. :) If they then ask "Why is 07 allowed but 08 not?", then you can go into detail about octal, why C uses 0755 to mean 493 (and why it makes a LOT more sense to key some things in using octal - nobody would understand what permissions 493 would mean), and then it becomes pretty clear that there are two viable interpretations for "0755". ChrisA

On 20 July 2015 at 12:01, Steven D'Aprano <steve@pearwood.info> wrote:
The integer literal zero only looks like the special case if you first exclude all the *other* ways of entering numeric values into Python applications and only consider "integer literal zero" and "non-zero integer literals". The fact that "0", "0.0" and "0j" produce values of different types is an implementation detail of Python's representation of abstract mathematical concepts, so the general case is actually "you can write numbers in Python the same way you write them in mathematical notation, including with leading zeroes if you want" (with the caveat that we use the electrical engineering "j" for imaginary numbers, rather than the more confusable mathematical "i") The Python 2 special case is then "unlike mathematical notation, using leading zeroes on an integer literal will result in it being interpreted in base 8 rather than base 10". The corresponding Python 3 special case is "unlike mathematical notation, you can't use leading zeroes on non-zero integers in Python 3, because of historical reasons relating to the syntax previously used for octal numbers in Python 2, and still used in C/C++, and various other languages". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Jul 18, 2015 at 01:35:49PM -0400, random832@fastmail.us wrote:
Do you mean up to 007, since they are the same in oct and dec? In either case, that introduces even more special cases. Whether it is one special case, 000, or eight, or ten, 000 through 009, the questions remain: - why does (let's say) `n = 02` work, but `n = 012` fail? - why would you intentionally write `n = 00' when you could simply write `n = 0`? Backwards compatibilty aside, I don't think there's any reason to keep this feature. I suspect that it can only mask typos, not be used for any sensible reason. Back when 000 meant octal zero, it might have made sense to write it that way to align with a bunch of other octal numbers, but now you would surely write 0o00 to align with 0o10. http://bugs.python.org/issue24668 -- Steve
participants (9)
-
Alexander Belopolsky
-
Chris Angelico
-
Eric V. Smith
-
Georg Brandl
-
Neil Girdhar
-
Nick Coghlan
-
random832@fastmail.us
-
Ron Adam
-
Steven D'Aprano