[pypy-dev] OT: abs(x) with 4 assembly insns
Bob Ippolito
bob at redivi.com
Mon Sep 29 04:41:56 CEST 2003
On Sunday, Sep 28, 2003, at 20:29 America/New_York, Christian Tismer
wrote:
> Hi friends,
>
> today, Armin presented me a simple brain-teaser:
>
> You have X86 assembly, you have only 4 insns,
> and you don't want to use a jump.
> You have a register, loaded with a value, and
> you should produce its abs, in another register,
> while preserving the argument register.
>
> Hmm. 4 insns.
If you are using a PowerPC with Altivec, you can get the absolute value
of four 32bit integers in three instructions from C using vector signed
int vec_abs(vector signed int a), which is a special compiler macro
that turns into:
# v1 is argument vector, v0 is result vector
vspltisw v0,0 # v0 = [0] * 4
vsubuwm v0,v0,v1 # v0 = map(int.__sub__, v0, v1)
vmaxsw v0,v1,v0 # v0 = map(max, v1, v0)
With conventional PowerPC instructions (also three instructions):
# r0 is argument register, r1 is result, r2 is scratch
srawi r1,r0,31 # see below, python can't do this nicely
add r2,r1,r0 # r2 = r1 + r0
xor r1,r2,r1 # r1 = r2 ^ r1
Here's an Python example of the second, mainly to demonstrate how srawi
works (working in unsigned longs because python 2.3 is inconvenient
about bit twiddling signed integers):
def binary(v):
return ''.join([(v & (1L << i)) and '1' or '0' for i in
range(32)[::-1]])+'b'
def srawi(v, num):
if v & 0x80000000L:
mask = 0xFFFFFFFFL ^ ((1L << (32L - num)) - 1)
else:
mask = 0x00000000L
return (v >> num) | mask
def new_abs(r0):
if r0 < 0:
r0 = 0x100000000L + r0
print 'input is %s' % (binary(r0),)
r1 = srawi(r0, 31)
print 'r1 = %s' % (binary(r1),)
r2 = r1 + r0
print 'r2 = %s' % (binary(r2),)
r1 = (r2 ^ r1) & 0xFFFFFFFFL
print 'r1 = %s' % (binary(r1),)
return r1
>>> new_abs(1000)
input is 00000000000000000000001111101000b
r1 = 00000000000000000000000000000000b
r2 = 00000000000000000000001111101000b
r1 = 00000000000000000000001111101000b
1000L
>>> new_abs(-1000)
input is 11111111111111111111110000011000b
r1 = 11111111111111111111111111111111b
r2 = 11111111111111111111110000010111b
r1 = 00000000000000000000001111101000b
1000L
-bob
More information about the Pypy-dev
mailing list