hex(-5) => Futurewarning: ugh, can't we have a better hex than '-'[:n<0]+hex(abs(n)) ??

Bengt Richter bokr at oz.net
Mon Aug 18 22:57:50 CEST 2003


On Mon, 18 Aug 2003 09:16:48 +0200, "Michael Peuser" <mpeuser at web.de> wrote:

>
>"Juha Autero" <Juha.Autero at iki.fi> schrieb im Newsbeitrag
>news:mailman.1061182647.31076.python-list at python.org...
>> Freddie <oinkfreddie at oinkshlick.oinknet> writes:
>>
>> >> There is a thread from this morning ("bitwise not ...") - this should
>be an
>> >> excellent contribution!
>> >> I have no mercy with someone writing hex(-5)
>> >>
>> >> Kindly
>> >> Michael P
>> >>
>> >>
>> >
>> > What about crazy people like myself? If you generate a crc32 value with
>zib,
>> > you occasionally get a negative number returned. If you try to convert
>that
>> > to hex (to test against a stored CRC32 value), it spits out a
>FutureWarning
>> > at me.
>>
>> Read the thread about bitwise not. Tell Python how many bits you
>> want. In case of CRC32 that is of course 32 bits:
>> hex(-5&2**32-1)
>>
>> Two questions: What is the best way to generate bitmask of n bits all
>> ones?
>
>Why do you object to 2**n-1? This is just fine I think.
>
>> And would sombody explain why hexadecimal (and octal) literals
>> behave differently from decimal literals? (see:
>> http://www.python.org/doc/current/ref/integers.html ) Why hexadecimal
>> literals from 0x80000000 to 0xffffffff are interpetred as negative
>> numbers instead of converting to long integers?
>
>Most of all this has practical reasons because of the use most programmers
>have for stating hexadecimal literals.
>
>Of couse some hex literals are not interpreted as negative numbers but the
>memory contents, because it has become undistinguishable what the origin had
>been.
>
>One will not expect
>        print int(0xffffffff )
>do something different from
>        x=0xffffffff
>        print int(x)
>
Unfortunately, the path to unification of integers to hardware-width independence
has backwards compatibility problems. I guess they are worse than for true division,
but has anyone really attempted to get a measure of them?

The options for hex representation seem to be (in terms of regexes)

1) signed standard: [+-]?0x[0-9a-fA-F]+
2) unprefixed standard: [0-9a-fA-F]+
which are produced by hex() and '%x' and '%X'
and interpreted by int(x, 16)

There is a need for a round trip hex representation/interpretation for signed integers
of arbitrary size, but IMO a prefixed '-' does violence to the usual expectation
for hex representation (i.e., a sensible view of the bits involved in a conventional
"two's complement" representation of the number to whatever width required).

I hope it can be avoided as a default, but that at a minimum, that an alternative will be provided.

For hex literals, the [01]x[0-9a-fA-F]+ variation seems clean (I forgot again who came up with that as
the best emerging alternative in an old thread, but credit is due). Tim liked it too, I believe ;-)

Since hex() will change anyway, how much more breakage will hex(-1) => 1xf create vs => -0x1 ?
Not to harp, but obviously the -0x1 gives no sense of the conventional underlying bit pattern
of ...fffff. (I am talking about an abstract bit pattern that extends infinitely to the left,
not a concrete implementation. Of course it is often useful to know how the abstraction gets
implemented on a particular platform, but that is not the only use for hex. It is also handy
as a human-readable representatation of an abstract bit sequence).

The other question is what to do with '%x'. The current 2.3 version doesn't seem to pay much
attention to field widths other than as a minimum, so that may offer an opportunity to control
the output. It does not generate an '0x' prefix ( easy for the positive case to specify as
0x%x) yet negatives presumably will prefix a '-'. (Will 0x-abcd be legal??)
What are some other possibilities?

Looking forward to using widths, what would one hope to get from '%2.2x'%-1 ?
I would hope for 'ff', personally ;-) And ' 1' for '%2.2x'%1 and '01' for %02.2x'%1.

Coming at it from this end, what happens if we drop the max width? What should we get
for '%02x'%1 ? That's easy: '01' as now. But '%02x'%-1 => 'ffffffff' currently, and that has
to change. Apparently to '-1' if things go as they're going? (Again, hexness is lost).

A possibility for unrestricted output width would be to print enough hex characters for an
unambiguous interpretation of sign. I.e., so that there are enough bits to include the sign
and an optional number of copies to pad to a full 4-bit hex character as necessary. This would
mean '%02x'%-1 => ff since that gives the first hex character the right sign.

That has the wrong interpretation (if you want to recover the signed value) for
int(('%02x'%-1),16) so that would need a fix for easy use. Although I usually dislike passing
flag info to functions by negating nonzero parameters, it would have mnemonic value in this case.
E.g., int(('%02x'%-1), -16) or the equivalent int('ff', -16) could mean use the leading bit as sign.
This convention would translate nicely to octal and binary string representations as well.

Of course int(('%02x'%255) could not return 'ff' as an unconstrained-width representation. It
would have to be '0ff' to provide the proper sign. Note that constraining this to a max width
of 2 would give 'ff'. This works for octal and binary too.

IMO this way of avoiding '-' in hex, octal, and binary string formats would make the strings
represent the data more clearly. These formats are mainly to communicate bit patterns IMO,
not just alternative ways to spell integer values.

If we have 1xf as a literal for spelling -0x1, I guess we don't need a literal format
for the leading-bit-as-sign convention. But the latter would be a way of reading and writing
signed arbitrary-width hex without losing the hexness of n the way you would with

    '%c%x'%('-'[:n<0],abs(n)) #yech, gak ;-/

Regards,
Bengt Richter




More information about the Python-list mailing list