[Python-Dev] Deprecation warning on integer shifts and such

Guido van Rossum guido@python.org
Wed, 14 Aug 2002 08:53:39 -0400

> Not at all! Removing the differences between ints and longs is good. 
> My reservations are about thehexadecimal representation.
>     - Currently, the '%u', '%x', '%X' and '%o' string formatting
>       operators and the hex() and oct() built-in functions behave
>       differently for negative numbers: negative short ints are
>       formatted as unsigned C long, while negative long ints are
>       formatted with a minus sign.  This will be changed to use the
>       long int semantics in all cases (but without the trailing 'L'
>       that currently distinguishes the output of hex() and oct() for
>       long ints).  Note that this means that '%u' becomes an alias for
>       '%d'.  It will eventually be removed.
> In Python up to 2.2 it's inconsistent between ints and longs:
> >>> hex(-16711681)
> '0xff00ffff'
> >>> hex(-16711681L)
> '-0xff0001L'		# ??!?!?
> The hex representation of ints gives me useful information about their 
> bit structure. After all, it is not immediately apparent to most mortals 
> that the number above is a mask for bits 16-23.

If you want to see the bit mask, all you have to do is and it with a
positive mask, e.g. 0xffff to see it as a 16-bit mask, 0xffffffffL for
a 32-bit mask, or 0xffffffffffffffff for a 64-bit mask.  And you can
go higher.

> The hex representation of longs is something I find quite misleading and 
> I think it's also unprecedented.  This wart has bothered me for a long 
> time now but I didn't have any use for it so I didn't mind too much. Now 
> it is proposed to extend this useless representation to ints so I do.

Just yesterday I got a proposal for a hex calculator that proposed
-0x1 to represent the mathematical value -1 in hex.  I don't think
it's unprecedented at all, although it may be unconventional.

> So we have two elements of the language that are inconsistent. One of 
> them is in widespread use and the other is... ahem... 
> Which one of them should be changed to conform to the other? 
> My proposal: 
> On 32 bit platforms:
> >>> hex(-16711681)
> '0xff00ffff'
> >>> hex(-16711681L)
> '0xff00ffff'
> On 64 bit platforms:
> >>> hex(-16711681)
> '0xffffffffff00ffffLL'
> >>> hex(-16711681L)
> '0xffffffffff00ffffLL'
> The 'LL' suffix means that this number is to be treated as a 64 bit
> *signed* number. This is consistent with the way it is interpreted by 
> GCC and other unix compilers on both 32 and 64 bit platforms.  


Python doesn't have the concept of 64-bit signed numbers.  It also
doesn't have the 'LL' syntax on input -- or do you propose to add that
too?  Why should the hex representation have to contain the conceptual
size of the number?  Do you propose to add LL to the hex
representations of positive numbers too?

> What to do about numbers from 2**31 to 2**32-1?
> >>> hex(4278255615)
> 0xff00ffffU
> The U suffix, also borrowed from C, makes it unambigous on 32 and 64 bit 
> platforms for both Python and C. 

Another -1.  Python doesn't have this on input.

> Representation of positive numbers:
>  0x00000000   -         0x7fffffff   : unambigous on all platforms
>  0x80000000U  -         0xffffffffU  : representation adds U suffix
> 0x100000000LL - 0x7fffffffffffffffLL : representation adds LL suffix

What does the addition of the U or LL suffix give you?  If I really
want to know how many bits there are I can count the digits, right?
And usually the app that does the printing knows in how many bits it
is interested.

> Representation of negative numbers:
>  0x80000000  - 0xffffffff (-2147483648 to -1):
> 	8 digits on 32 bit platforms
>  0xffffffff80000000LL  - 0xffffffffffffffffLL  (same range):
> 	16 digits and LL suffix on 64 bit platforms
>  others negative numbers: 16 digits and LL suffix on all platforms.

And what do you suppose we do with hex(-100**100)?

> This makes the hex representation of a number informative and consistent 
> between int and long on all platforms. It is also consistent with the
> C compiler on the same platform. Yes, it will produce a different text
> representation of some numbers on different platforms but this conveys
> important information about the bit structure of the number which really
> is different between platforms. eval()ing it back to a number is still 
> consistent.

Why is the bit structure so important to you?

> When converting in the other direction (hex representation to number) 
> there is an ambigous range from 0x80000000 to 0xffffffff.  Should it be 
> treated as signed or unsigned?  The current interpretation is signed. PEP
> 237 proposes to change it to unsigned. I propose to do neither - this range
> should be deprecated and some explicit notation should be used instead.

Now that's really helpful. :-(  What is someone to do who wants to
enter a hex constant they got from some documentation?  E.g. the AIFC
magic number is 0xA2805140.  Why shouldn't I be able to write that?
What's the use of having a discontinuity in our notation?

(I wanted to write much stronger words but I'm trying to respond to
the proposal only.  I guess I'm -1000000 on this.)

--Guido van Rossum (home page: http://www.python.org/~guido/)