Mailman 3 Disallow "00000" as a synonym for "0" - Python-ideas - python.org

newer
Re: [Python-ideas] Briefer string...

Disallow "00000" as a synonym for "0"

older
namedtuple fields with default...

Neil Girdhar

July 16, 2015

12:15 p.m.

As per this question: http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-... It seems like Python accepts "000000000" to mean "0". Whatever the historical reason, should this be deprecated?

Attachments:

attachment.htm (text/html — 323 bytes)

Reply

Sign in to reply online Use email software

Show replies by date

Eric V. Smith

July 2015

4:28 p.m.

On 07/16/2015 06:15 AM, Neil Girdhar wrote:

No. It would needlessly break working code. Eric.

Reply

Sign in to reply online Use email software

Steven D'Aprano

5:22 p.m.

On Fri, Jul 17, 2015 at 10:28:19AM -0400, Eric V. Smith wrote:

I wonder what working code uses 00 when 0 is wanted? Do you have any examples? I believe that anyone writing 00 is more likely to have made a typo than to actually intend to get 0. In Python 2, 00 has an obvious and correct interpretation: it is zero in octal. But in Python 3, octal is written with the prefix 0o not 0. py> 0o10 8 py> 010 File "<stdin>", line 1 010 ^ SyntaxError: invalid token (The 0o prefix also works in Python 2.7.) In Python 3, 00 has no sensible meaning. It's not octal, binary or hex, and it shouldn't be decimal. Decimal integers are explicitly prohibited from beginning with a leading zero: https://docs.python.org/3/reference/lexical_analysis.html#integers so the mystery is why *zero* is a special case permitted to have leading zeroes. The lexical definition of "decimal integer" is: decimalinteger ::= nonzerodigit digit* | "0"+ Why was it defined that way? The more obvious: decimalinteger ::= nonzerodigit digit* | "0" was the definition in Python 2. As the Stackoverflow post above points out, the definition of decimalinteger actually in use seems to violate PEP 3127, and supporting "0"+ was added as a special case by Georg Brandl. Since leading 0 digits in decimal int literals are prohibited, we cannot write 0001, 0023 etc. Why would we write 0000 to get zero? Unless somebody can give a good explanation for why leading zeroes are permitted for zero, I think it was a mistake to allow them, and is an ugly wart on the language. I think that should be deprecated and eventually removed. Since it only affects int literals, any deprecation warning will occur at compile-time, so it shouldn't have any impact on runtime performance. -- Steve

Reply

Sign in to reply online Use email software

Georg Brandl

8:33 p.m.

On 07/17/2015 05:22 PM, Steven D'Aprano wrote:

I could tell you, but then I'd have to kill you. Georg

Reply

Sign in to reply online Use email software

Ron Adam

9:40 p.m.

On 07/17/2015 11:22 AM, Steven D'Aprano wrote:

And then there is this...

Cheers, Ron

Reply

Sign in to reply online Use email software

Steven D'Aprano

5:17 a.m.

On Sat, Jul 18, 2015 at 03:40:28PM -0400, Ron Adam wrote:

The parsing rules for floats are not the same as int, and since floats always use decimal, never octal, there's no ambiguity or confusion from writing "0000.0000". I don't propose changing floats. -- Steve

Reply

Sign in to reply online Use email software

Ron Adam

6:22 p.m.

On 07/18/2015 11:17 PM, Steven D'Aprano wrote:

On Sat, Jul 18, 2015 at 03:40:28PM -0400, Ron Adam wrote:

...
...
And then there is this...

...
...
...
>>>000.0 0.0 >>>000.1 0.1

The parsing rules for floats are not the same as int,

Umm.. yes. But are they intentional different?

and since floats always use decimal, never octal, there's no ambiguity or confusion from writing "0000.0000". I don't propose changing floats.

Me either. It seems to me int's should be relaxed to allow for leading zero's instead. The most common use of leading zero's is when numbers in strings are used and those numbers are sorted. Without the leading zero's the sort order may not be in numerical order. The int class supports using leading zero's in strings, but not in literal form. (Although it may use a C string to int function when parsing it.)

...
...
int("0000") 0 int("007") 7

...
...
int(0000) 0

...
...
int(007) File "<stdin>", line 1 int(007) ^ SyntaxError: invalid token

...
...
int(007.0) 7

...
...
float("0001") 1.0

It's common to cut and paste numerical data from text data. If you are cutting numerical data from a table, then you may also need to remove all the leading zeros. A macro can do that, but it complicates a simple cut and paste operation. Note that this does not effect the internal representation of ints, only how python interprets the string literal during the compiling process. A social reason for this limitation is that a number of other languages do use a leading digit 0 to define octal numbers. And this helps catch silent errors due to copying numbers directly in that case. (Python uses "0o" and not just "0".) In python, it just catches a possible silent error, when a 0onnn is mistyped as 00nnn. So this looks like it's a quick BDFL judgement call to me. (or other core developer number specialist call.) I think the status quo wins by default otherwise. Cheers, Ron

Reply

Sign in to reply online Use email software

Chris Angelico

6:37 p.m.

On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:

Yes, because int() can take an extra parameter to specify the base. Source code can't.

Python 2 also supported C-style "0777" to mean 511, and it's all through Unix software (eg "0777" is a common notation for a world-writable directory's permission bits; since there are three bits (r/w/x) for each of the three categories (u/g/o), it makes sense to use octal). Supporting "0777" in source code and having it mean 777 decimal would be extremely confusing, which is why it's a straight-up error in Python 3. ChrisA

Reply

Sign in to reply online Use email software

Nick Coghlan

2:38 a.m.

On 20 Jul 2015 2:38 am, "Chris Angelico" <rosuav@gmail.com> wrote:

On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:

...
A social reason for this limitation is that a number of other languages

do

...
use a leading digit 0 to define octal numbers. And this helps catch silent errors due to copying numbers directly in that case. (Python uses "0o" and not just "0".)

Python 2 also supported C-style "0777" to mean 511, and it's all through Unix software (eg "0777" is a common notation for a world-writable directory's permission bits; since there are three bits (r/w/x) for each of the three categories (u/g/o), it makes sense to use octal). Supporting "0777" in source code and having it mean 777 decimal would be extremely confusing, which is why it's a straight-up error in Python 3.

Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals, since we can't know if they're supposed to be octal values or not, or if the "b" or "x" was left out of a binary or hex literal that has been padded out to a fixed number of bits, or if the decimal point was left out of a floating point literal. The one integer literal case that *could* be reliably kept consistent with the general mathematical notation of "leading zeroes are permitted, but have no significance" was zero itself, which is what was done. Cheers, Nick.

Reply

Sign in to reply online Use email software

Steven D'Aprano

4:01 a.m.

On Mon, Jul 20, 2015 at 10:38:02AM +1000, Nick Coghlan wrote:

Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals,

That's a ... different ... definition of "special case". You're saying that, out of the literally infinite number of possible ints one might attempt to write, *all of them except zero* are the special case, while zero itself is the non-special case. O-kay. I get the argument that allowing people to write 000000000 when they want a int zero is harmless, and the code churn required to prevent that outweighs any benefit gained. The task I created on the tracker has been closed as a "won't fix" or "rejected" (I forget which one), and I'm not going to argue with that. But I did find your perspective above funny enough that I had to reply. But for the record, and this will be my last word on the subject:

Right. So if you see somebody has written 00 as a literal in Python 3, it's *more likely to be an error than deliberate*. E.g. we can align octal or hex numbers using a fixed width string with leading zeroes where needed: nums = [0x00001234, 0x000000A3, 0x00000005, 0x000027F3, 0x00000000, # this is okay 00000000, # breaks the alignment 0000000000, # aligned but where's the X? 0x0000C21D, ] but if you see a plain, unprefixed 0 mixed in there, that smells of an possible error. Even if it isn't an error, to me it hints of an error enough that I'd want to question the code author's intention. "Did you really mean zero, or is that a typo?" Regardless of everything else, to me this is an aesthetic question. Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart. But until and unless somebody actually gets bitten by this in real code, and a typo hides in plain view disguised as zero, I can't honestly say it is outrightly harmful. -- Steve

Reply

Sign in to reply online Use email software

Alexander Belopolsky

4:14 a.m.

On Sun, Jul 19, 2015 at 10:01 PM, Steven D'Aprano <steve@pearwood.info> wrote:

Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart.

I agree in general, but there is one case where I am on the fence: dates = [ date(2005, 07, 01), date(2005, 11, 15), ..] looks marginally better than the valid alternative. I often see this form written by 2.7 users, and it requires a medium size lecture to explain why they should not write code like this even if it seemingly works.

Reply

Sign in to reply online Use email software

Chris Angelico

4:46 a.m.

On Mon, Jul 20, 2015 at 12:14 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

The lecture should be fairly simple. Just put 08 in there and you'll see why you shouldn't do it. :) If they then ask "Why is 07 allowed but 08 not?", then you can go into detail about octal, why C uses 0755 to mean 493 (and why it makes a LOT more sense to key some things in using octal - nobody would understand what permissions 493 would mean), and then it becomes pretty clear that there are two viable interpretations for "0755". ChrisA

Reply

Sign in to reply online Use email software

Nick Coghlan

6:48 a.m.

On 20 July 2015 at 12:01, Steven D'Aprano <steve@pearwood.info> wrote:

The integer literal zero only looks like the special case if you first exclude all the *other* ways of entering numeric values into Python applications and only consider "integer literal zero" and "non-zero integer literals". The fact that "0", "0.0" and "0j" produce values of different types is an implementation detail of Python's representation of abstract mathematical concepts, so the general case is actually "you can write numbers in Python the same way you write them in mathematical notation, including with leading zeroes if you want" (with the caveat that we use the electrical engineering "j" for imaginary numbers, rather than the more confusable mathematical "i") The Python 2 special case is then "unlike mathematical notation, using leading zeroes on an integer literal will result in it being interpreted in base 8 rather than base 10". The corresponding Python 3 special case is "unlike mathematical notation, you can't use leading zeroes on non-zero integers in Python 3, because of historical reasons relating to the syntax previously used for octal numbers in Python 2, and still used in C/C++, and various other languages". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Reply

Sign in to reply online Use email software

random832＠fastmail.us

7:35 p.m.

On Fri, Jul 17, 2015, at 10:28, Eric V. Smith wrote:

Counter-proposal - allow 00001 through 00009 since they are equally unambiguous.

Reply

Sign in to reply online Use email software

Steven D'Aprano

1:03 p.m.

On Sat, Jul 18, 2015 at 01:35:49PM -0400, random832@fastmail.us wrote:

Do you mean up to 007, since they are the same in oct and dec? In either case, that introduces even more special cases. Whether it is one special case, 000, or eight, or ten, 000 through 009, the questions remain: - why does (let's say) `n = 02` work, but `n = 012` fail? - why would you intentionally write `n = 00' when you could simply write `n = 0`? Backwards compatibilty aside, I don't think there's any reason to keep this feature. I suspect that it can only mask typos, not be used for any sensible reason. Back when 000 meant octal zero, it might have made sense to write it that way to align with a bunch of other octal numbers, but now you would surely write 0o00 to align with 0o10. http://bugs.python.org/issue24668 -- Steve

Reply

Sign in to reply online Use email software

Eric V. Smith

July 2015

4:28 p.m.

On 07/16/2015 06:15 AM, Neil Girdhar wrote:

No. It would needlessly break working code. Eric.

Reply

Sign in to reply online Use email software

Steven D'Aprano

5:22 p.m.

On Fri, Jul 17, 2015 at 10:28:19AM -0400, Eric V. Smith wrote:

I wonder what working code uses 00 when 0 is wanted? Do you have any examples? I believe that anyone writing 00 is more likely to have made a typo than to actually intend to get 0. In Python 2, 00 has an obvious and correct interpretation: it is zero in octal. But in Python 3, octal is written with the prefix 0o not 0. py> 0o10 8 py> 010 File "<stdin>", line 1 010 ^ SyntaxError: invalid token (The 0o prefix also works in Python 2.7.) In Python 3, 00 has no sensible meaning. It's not octal, binary or hex, and it shouldn't be decimal. Decimal integers are explicitly prohibited from beginning with a leading zero: https://docs.python.org/3/reference/lexical_analysis.html#integers so the mystery is why *zero* is a special case permitted to have leading zeroes. The lexical definition of "decimal integer" is: decimalinteger ::= nonzerodigit digit* | "0"+ Why was it defined that way? The more obvious: decimalinteger ::= nonzerodigit digit* | "0" was the definition in Python 2. As the Stackoverflow post above points out, the definition of decimalinteger actually in use seems to violate PEP 3127, and supporting "0"+ was added as a special case by Georg Brandl. Since leading 0 digits in decimal int literals are prohibited, we cannot write 0001, 0023 etc. Why would we write 0000 to get zero? Unless somebody can give a good explanation for why leading zeroes are permitted for zero, I think it was a mistake to allow them, and is an ugly wart on the language. I think that should be deprecated and eventually removed. Since it only affects int literals, any deprecation warning will occur at compile-time, so it shouldn't have any impact on runtime performance. -- Steve

Reply

Sign in to reply online Use email software

Georg Brandl

8:33 p.m.

On 07/17/2015 05:22 PM, Steven D'Aprano wrote:

I could tell you, but then I'd have to kill you. Georg

Reply

Sign in to reply online Use email software

Ron Adam

9:40 p.m.

On 07/17/2015 11:22 AM, Steven D'Aprano wrote:

And then there is this...

Cheers, Ron

Reply

Sign in to reply online Use email software

Steven D'Aprano

5:17 a.m.

On Sat, Jul 18, 2015 at 03:40:28PM -0400, Ron Adam wrote:

The parsing rules for floats are not the same as int, and since floats always use decimal, never octal, there's no ambiguity or confusion from writing "0000.0000". I don't propose changing floats. -- Steve

Reply

Sign in to reply online Use email software

Ron Adam

6:22 p.m.

On 07/18/2015 11:17 PM, Steven D'Aprano wrote:

On Sat, Jul 18, 2015 at 03:40:28PM -0400, Ron Adam wrote:

...
...
And then there is this...

...
...
...
>>>000.0 0.0 >>>000.1 0.1

The parsing rules for floats are not the same as int,

Umm.. yes. But are they intentional different?

and since floats always use decimal, never octal, there's no ambiguity or confusion from writing "0000.0000". I don't propose changing floats.

Me either. It seems to me int's should be relaxed to allow for leading zero's instead. The most common use of leading zero's is when numbers in strings are used and those numbers are sorted. Without the leading zero's the sort order may not be in numerical order. The int class supports using leading zero's in strings, but not in literal form. (Although it may use a C string to int function when parsing it.)

...
...
int("0000") 0 int("007") 7

...
...
int(0000) 0

...
...
int(007) File "<stdin>", line 1 int(007) ^ SyntaxError: invalid token

...
...
int(007.0) 7

...
...
float("0001") 1.0

It's common to cut and paste numerical data from text data. If you are cutting numerical data from a table, then you may also need to remove all the leading zeros. A macro can do that, but it complicates a simple cut and paste operation. Note that this does not effect the internal representation of ints, only how python interprets the string literal during the compiling process. A social reason for this limitation is that a number of other languages do use a leading digit 0 to define octal numbers. And this helps catch silent errors due to copying numbers directly in that case. (Python uses "0o" and not just "0".) In python, it just catches a possible silent error, when a 0onnn is mistyped as 00nnn. So this looks like it's a quick BDFL judgement call to me. (or other core developer number specialist call.) I think the status quo wins by default otherwise. Cheers, Ron

Reply

Sign in to reply online Use email software

Chris Angelico

July 2015

6:37 p.m.

On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:

Yes, because int() can take an extra parameter to specify the base. Source code can't.

Python 2 also supported C-style "0777" to mean 511, and it's all through Unix software (eg "0777" is a common notation for a world-writable directory's permission bits; since there are three bits (r/w/x) for each of the three categories (u/g/o), it makes sense to use octal). Supporting "0777" in source code and having it mean 777 decimal would be extremely confusing, which is why it's a straight-up error in Python 3. ChrisA

Reply

Sign in to reply online Use email software

Nick Coghlan

2:38 a.m.

On 20 Jul 2015 2:38 am, "Chris Angelico" <rosuav@gmail.com> wrote:

On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam <ron3200@gmail.com> wrote:

...
A social reason for this limitation is that a number of other languages

do

...
use a leading digit 0 to define octal numbers. And this helps catch silent errors due to copying numbers directly in that case. (Python uses "0o" and not just "0".)

Python 2 also supported C-style "0777" to mean 511, and it's all through Unix software (eg "0777" is a common notation for a world-writable directory's permission bits; since there are three bits (r/w/x) for each of the three categories (u/g/o), it makes sense to use octal). Supporting "0777" in source code and having it mean 777 decimal would be extremely confusing, which is why it's a straight-up error in Python 3.

Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals, since we can't know if they're supposed to be octal values or not, or if the "b" or "x" was left out of a binary or hex literal that has been padded out to a fixed number of bits, or if the decimal point was left out of a floating point literal. The one integer literal case that *could* be reliably kept consistent with the general mathematical notation of "leading zeroes are permitted, but have no significance" was zero itself, which is what was done. Cheers, Nick.

Reply

Sign in to reply online Use email software

Steven D'Aprano

4:01 a.m.

On Mon, Jul 20, 2015 at 10:38:02AM +1000, Nick Coghlan wrote:

Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals,

That's a ... different ... definition of "special case". You're saying that, out of the literally infinite number of possible ints one might attempt to write, *all of them except zero* are the special case, while zero itself is the non-special case. O-kay. I get the argument that allowing people to write 000000000 when they want a int zero is harmless, and the code churn required to prevent that outweighs any benefit gained. The task I created on the tracker has been closed as a "won't fix" or "rejected" (I forget which one), and I'm not going to argue with that. But I did find your perspective above funny enough that I had to reply. But for the record, and this will be my last word on the subject:

Right. So if you see somebody has written 00 as a literal in Python 3, it's *more likely to be an error than deliberate*. E.g. we can align octal or hex numbers using a fixed width string with leading zeroes where needed: nums = [0x00001234, 0x000000A3, 0x00000005, 0x000027F3, 0x00000000, # this is okay 00000000, # breaks the alignment 0000000000, # aligned but where's the X? 0x0000C21D, ] but if you see a plain, unprefixed 0 mixed in there, that smells of an possible error. Even if it isn't an error, to me it hints of an error enough that I'd want to question the code author's intention. "Did you really mean zero, or is that a typo?" Regardless of everything else, to me this is an aesthetic question. Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart. But until and unless somebody actually gets bitten by this in real code, and a typo hides in plain view disguised as zero, I can't honestly say it is outrightly harmful. -- Steve

Reply

Sign in to reply online Use email software

Alexander Belopolsky

4:14 a.m.

On Sun, Jul 19, 2015 at 10:01 PM, Steven D'Aprano <steve@pearwood.info> wrote:

Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart.

I agree in general, but there is one case where I am on the fence: dates = [ date(2005, 07, 01), date(2005, 11, 15), ..] looks marginally better than the valid alternative. I often see this form written by 2.7 users, and it requires a medium size lecture to explain why they should not write code like this even if it seemingly works.

Reply

Sign in to reply online Use email software

Chris Angelico

4:46 a.m.

On Mon, Jul 20, 2015 at 12:14 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

The lecture should be fairly simple. Just put 08 in there and you'll see why you shouldn't do it. :) If they then ask "Why is 07 allowed but 08 not?", then you can go into detail about octal, why C uses 0755 to mean 493 (and why it makes a LOT more sense to key some things in using octal - nobody would understand what permissions 493 would mean), and then it becomes pretty clear that there are two viable interpretations for "0755". ChrisA

Reply

Sign in to reply online Use email software

Nick Coghlan

6:48 a.m.

On 20 July 2015 at 12:01, Steven D'Aprano <steve@pearwood.info> wrote:

The integer literal zero only looks like the special case if you first exclude all the *other* ways of entering numeric values into Python applications and only consider "integer literal zero" and "non-zero integer literals". The fact that "0", "0.0" and "0j" produce values of different types is an implementation detail of Python's representation of abstract mathematical concepts, so the general case is actually "you can write numbers in Python the same way you write them in mathematical notation, including with leading zeroes if you want" (with the caveat that we use the electrical engineering "j" for imaginary numbers, rather than the more confusable mathematical "i") The Python 2 special case is then "unlike mathematical notation, using leading zeroes on an integer literal will result in it being interpreted in base 8 rather than base 10". The corresponding Python 3 special case is "unlike mathematical notation, you can't use leading zeroes on non-zero integers in Python 3, because of historical reasons relating to the syntax previously used for octal numbers in Python 2, and still used in C/C++, and various other languages". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Reply

Sign in to reply online Use email software

random832＠fastmail.us

July 2015

7:35 p.m.

On Fri, Jul 17, 2015, at 10:28, Eric V. Smith wrote:

Counter-proposal - allow 00001 through 00009 since they are equally unambiguous.

Reply

Sign in to reply online Use email software

Steven D'Aprano

1:03 p.m.

On Sat, Jul 18, 2015 at 01:35:49PM -0400, random832@fastmail.us wrote:

Do you mean up to 007, since they are the same in oct and dec? In either case, that introduces even more special cases. Whether it is one special case, 000, or eight, or ten, 000 through 009, the questions remain: - why does (let's say) `n = 02` work, but `n = 012` fail? - why would you intentionally write `n = 00' when you could simply write `n = 0`? Backwards compatibilty aside, I don't think there's any reason to keep this feature. I suspect that it can only mask typos, not be used for any sensible reason. Back when 000 meant octal zero, it might have made sense to write it that way to align with a bunch of other octal numbers, but now you would surely write 0o00 to align with 0o10. http://bugs.python.org/issue24668 -- Steve

Reply

Sign in to reply online Use email software

3530

Age (days ago)

3534

Last active (days ago)

Download

14 comments

9 participants

tags

participants (9)

Alexander Belopolsky
Chris Angelico
Eric V. Smith
Georg Brandl
Neil Girdhar
Nick Coghlan
random832＠fastmail.us
Ron Adam
Steven D'Aprano