On Wed, 1 Feb 2006 13:54:49 -0500 (EST), Paul Svensson

On Wed, 1 Feb 2006, Barry Warsaw wrote:

The proposal for something like 0xff, 0o664, and 0b1001001 seems like the right direction, although 'o' for octal literal looks kind of funky. Maybe 'c' for oCtal? (remember it's 'x' for heXadecimal).

Shouldn't it be 0t644 then, and 0n1001001 for binary ? That would sidestep the issue of 'b' and 'c' being valid hexadecimal digits as well.

Regarding negative numbers, I think they're a red herring. If there is any need for a new literal format, it would be to express ~0x0f, not -0x10. 1xf0 has been proposed before, but I think YAGNI.

YMMV re YAGNI, but you have an excellent point re negative numbers vs ~. If you look at examples, the representation digits _are_ actually "~" ;-) I.e., I first proposed 'c' in place of 'r' for 16cf0, where "c" stands for radix _complement_, and 0 and 1 are complements wrt 2, as are hex 0 and f wrt radix 16. So the actual notation has digits that are radix-complement, and are evaluated as such to get the integer value. So ~0x0f is represented r16-f0, which does produce a negative number (but whose integer value BTW is -0x10, not 0x0f. I.e., -16r-f0 == 16r+10, and the sign after the 'r' is a complement-notation indicator, not an algebraic sign. (Perhaps or '^' would be a better indicator, as -16r^f0 == 0x10) Thank you for making the point that the negative value per se is a red herring. Still, that is where the problem shows up: e.g. when we want to define a hex bit mask as an int and the sign bit happens to be set. IMO it's a wart that if you want to define bit masks as integer data, you have to invoke computation for the sign bit, e.g., BIT_0 = 0x1 BIT_1 = 0x02 ... BIT_30 = 0x40000000 BIT_31 = int(-0x80000000) instead of defining true literals all the way, e.g., BIT_0 = 16r1 BIT_1 = 16r2 # or 16r00000002 obviously ... BIT_30 = 16r+40000000 BIT_31 = 16r-80000000) and if you wanted to define the bit-wise complement masks as literals, you could, though radix-2 is certainly easier to see (introducing '_' as transparent elision) CBIT_0 = 16r-f # or 16r-fffffffe or 2r-0 or 2r-11111111_11111111_11111111_11111110 CBIT_1 = 16r-d # or 16r-fffffffd or 2r-01 or 2r-11111111_11111111_11111111_11111101 ... CBIT_30 = 16r-b0000000 or 2r-10111111_11111111_11111111_11111111 CBIT_31 = 16r+7fffffff or 2r+01111111_11111111_11111111_11111111 With constant-folding optimization and some kind of inference-guiding for expressions like -sys.maxint-1, perhaps computation vs true literals will become moot. And practically it already is, since a one-time computation is normally insignificant in time or space. But aren't we also targeting platforms also where space is at a premium, and being able to define constants as literal data without resorting to workaround pre-processing would be nice? BTW, base-complement decoding works by generalized analogy to twos complement decoding, by assuming that the most significant digit is a signed coefficient value for base**digitpos in radix-complement form, where the upper half of the range of digits represents negative values as digit-radix, and the rest positive as digit. The rest of the digits are all positive coefficients for base powers. E.g., to decode our simple example[1] represented as a literal in base-complement form (very little tested):

def bclitval(s, digits='0123456789abcdefghijklmnopqrstuvwxyz'): ... """ ... decode base complement literal of form <base>r<sign><digits> ... where ... <base> is in range(2,37) or more if digits supplied ... <sign> is a mnemonic + for digits[0] and - for digits[<base>-1] or absent ... <digits> are decoded as base-complement notation after <sign> if ... present is changed to appropriate digit. ... The first digit is taken as a signed coefficient with value ... digit-<base> (negative) if the digit*2>=B and digit (positive) otherwise. ... """ ... B, s = s.split('r', 1) ... B = int(B) ... if s[0] =='+': s = digits[0]+s[1:] ... elif s[0] =='-': s = digits[B-1]+s[1:] ... ds = digits.index(s[0]) ... if ds*2 >= B: acc = ds-B ... else: acc = ds ... for c in s[1:]: acc = acc*B + digits.index(c) ... return acc ... bclitval('16r80000004') -2147483644 bclitval('2r10000000000000000000000000000100') -2147483644

BTW, because of the decoding method, extended "sign" bits don't force promotion to a long value:

bclitval('16rffffffff80000004') -2147483644

[1] To reduce all this eye-glazing discussion to a simple example, how do people now use hex notation to define an integer bit-mask constant with bits 31 and 2 set? (assume 32-bit int for target platform, counting bit 0 as LSB and bit 31 as sign). Regards, Bengt Richter