[Python-Dev] Negated hex/oct constants (SF #660455)

Wed, 05 Feb 2003 14:05:01 -0500

There's a problem with certain negated hex/oct constants.  (If you're
in a hurry, skip to --- CONCLUSION --- below.)

The problem is with hex/oct constants that fit in a C unsigned long
but would be negative when seen as a C signed long.  Examples on
32-bit systems are 0x80000000 through 0xffffffff and 020000000000
through 037777777777.

Up to and including Python 2.3, these constants are considered to be
negative ints, for example:

  >>> 0x80000000
  -2147483648
  >>> 0xffffffff
  -1
  >>> 020000000000
  -2147483648
  >>> 037777777777
  -1
  >>> 

But now watch what happens when you put a minus sign in front of
these:

  Python 2.1:

  >>> 0xffffffff
  -1
  >>> -0xffffffff
  1
  >>>

  Python 2.2:

  >>> 0xffffffff
  -1
  >>> -0xffffffff
  -4294967295L                          <<<-------- !!!
  >>> 

  Python2.3:

  >>> 0xffffffff
  <stdin>:1: FutureWarning: hex/oct constants > sys.maxint will return positive values in Python 2.4 and up
  -1
  >>> -0xffffffff
  -4294967295L                          <<<-------- !!!
  >>> 

  Python 2.4:

  >>> 0xffffffff
  4294967295
  >>> -0xffffffff
  -4294967295
  >>>

(I made the latter one up, but that's what I predict based on PEP
237.)

The cases that bother me are -0xffffffff in Python 2.2 and 2.3: this
is an unintended side effect of a peephole optimization!!!

It's quite unsettling:

  >>> x = 0xffffffff
  >>> x
  -1
  >>> -x
  1
  >>> -(0xffffffff)
  1
  >>> -0xffffffff 
  -4294967295L                          <<<<-------- !!!
  >>> 

The reason is that the bytecode compiler recognizes a minus sign
followed (possibly with intervening whitespace, but no other tokens)
by a numeric literal, and spits out code to store the already-negated
value in the const array and then loads the negated value, rather than
loading the original constant from the const array and negating it.

But the negated value is computed by tacking a minus sign character in
front of the string value of the literal, and then calling the
text-to-number conversion routine (strtol in this case).  

Thus, the expression 0xffffffff is negative one because it is first
interpreted as an unsigned C long, and then cast to a signed C long
(which just copies the bit pattern).  And the expression -0xffffffff is
negative 4294967295L because strtol can't handle it, so it is given to
PyLong_FromString(), which of course interprets 0xffffffff as 2**32-1,
and then negates it, because of the minus sign.

On the other hand, -(0xffffffff) etc. are loading the constant
0xffffffff, which happens to be -1, and then negating it as a separate
bytecode instruction, resulting in 1.

--- CONCLUSION ---

The bad thing is that this bug is already in all versions of Python
2.2, as well as in Python 2.3a1.

I originally thought that it absolutely had to be fixed, because the
inconsistency between -(0xffffffff) and -0xffffffff is just too bad to
bear.

On the other hand, given that it's already this way in Python 2.2, and
will be again in Python 2.4, maybe we should leave it this way?  I
find it almost cute that you can spell negative constants with an
explicit minus sign.  (It's even correct in the sense that it doesn't
issue a warning!)

Opinions?  Am I crazy?  It seems too late to use the time machine --
OTOH I could claim that this is *already* caused by time machine usage
from the Python 2.4 era.

--Guido van Rossum (home page: http://www.python.org/~guido/)