Proper Radix Formatting of Longs?

Sat Jun 5 15:59:58 EDT 1999

[Tom Vrankar]
> I use Python a lot for computer hardware testing and communications. It
> generally works great.
>
> However,
> [a number of irritations boiling down to that Python has no unsigned
>  integral type & checks for overflow on signed integer arithmetic,
>  so unsigned 32-bit int arithmetic is clumsy on a 32-bit platform]
> ...
> Is there a standard pythonic idiom for using 32-bit integers as
> if they were unsigned, so that 0x8000_0000 +0x8000_0000 =0, just
> like in C or my hardware ALU?

Curiously, I first started using Python when working for a HW vendor trying
to use 32-bit workstations to design a 64-bit CPU with a 64x64->128
multiplier.  All the same irritations applied there -- but since *no* form
of 32-bit arithmetic was adequate (in Python or C), it never occurred to us
that faking it with Python longs was a burden <wink>.

Which is the easiest all-Python answer there is:  Python doesn't have an
unsigned integral type, period.  So you convert everything to long on input,
live happily for the duration, and cut it back to whatever you need on
output.

If that's not fast enough (it was for me -- in fact, even wrapped stuff in
classes to hide the I/O conversions), you'll have to write a C extension
module.  Note that Python doesn't gripe about 32-bit bit-fiddling ops (~ & |
^ << >>), so maybe you can get away with writing just a handful of C
functions.

> Is there a standard pythonic way to format longs into
> arbitrary radices (sort of like an inverse string.atol)?

str(), hex() and oct() are "it".  Feed hex a long, and it will always have
"0x" at the front and "L" at the end, so hex(along)[2:-1] always works to
strip the decorations.

> Is there a secret handshake required to at least get string.atoi()
> to happily convert "0x8000_0000"  symmetrically with "%x" %0x8000_0000?

No.  Python allows 0x notation with the MSB set in numeric program literals
for the convenience of programmers building bit masks; atoi is stricter
because it's used to crack user (as opposed to programmer) input.

Note that there's a real distinction here:  as a *bitmask*, 0xffffffff
yields an int with exactly the least-significant 32 bits set, regardless of
whether you're running an a platform where Python's ints are 32 bits or 64
bits.  But as an *integer* (which is atoi's view of the world), 0xffffffff
would be -1 on a 32-bit platform but 4294967295 on a 64-bit one.

So the dual view of bounded signed ints as either bitmasks or integers isn't
symmetric "even in theory", and Python picked a compromise.  It's not
convenient for everyone, but so it goes.  The dual view of *un*bounded
signed ints doesn't suffer this asymmetry, and is independent of platform.

> Yes, I know I could probably write some elaborate python
> functions to manually convert back and forth,

If you can afford Python speed, "elaborate" is inaccurate -- these kinds of
functions are generally trivial to knock off; e.g.,

>>> def last32(i, _mask32=0xffffffffL):
	try:
		return int(i)
	except OverflowError:
		tail = i & _mask32 # now positive long
		return (int(tail >> 1) << 1) | int(tail & 1)

>>> def uns32_add(i, j):
	try:
		return i + j
	except OverflowError:
		return last32(long(i) + j)

>>> uns32_add(0x80000000, 0x80000000)
0
>>>

> but when handling megabytes, the performance would be unacceptable.

That's what you get for working in a field 99.9999% of the world doesn't
care about <wink>.  Seriously, Python isn't going to grow an unsigned 32-bit
type overnight, so if it really is too slow in Python you're going to have
to avail yourself of the extension facilities.  You can write a complete
uns32 type as an extension type (a doable but non-trivial project), or just
the handful of functions you need most.

general-purpose-vs-all-purpose-ly y'rs  - tim