[pypy-dev] OT: abs(x) with 4 assembly insns
Christian Tismer
tismer at tismer.com
Mon Sep 29 02:29:04 CEST 2003
Hi friends,
today, Armin presented me a simple brain-teaser:
You have X86 assembly, you have only 4 insns,
and you don't want to use a jump.
You have a register, loaded with a value, and
you should produce its abs, in another register,
while preserving the argument register.
Hmm. 4 insns.
Here is the standard C solution for doing abs without a jump,
whioch you can find on the internet:
int int_abs(int a) {
return a - ((a+a) & (a>>31));
}
This clearly computes the bas, quite nicely.
The resulting assembly code looks like so:
; 2 : return a - ((a+a) & (a>>31));
00000 8b 44 24 04 mov eax, DWORD PTR _a$[esp-4]
00004 8b c8 mov ecx, eax
00006 c1 f9 1f sar ecx, 31 ; 0000001fH
00009 8d 14 00 lea edx, DWORD PTR [eax+eax]
0000c 23 ca and ecx, edx
0000e 2b c1 sub eax, ecx
; 3 : }
This is clearly some solution. But, ignoring the first insn,
which is just moving the argument, this really takes
five instructions, one too much of course.
Actually, there is a much better solution.
X86 has the cdq insn, which does a sign extension from
32 bit signed eax to 64 bit signed edx/eax.
This in fact puts a -1 into edx, whenever the eax is negative,
or a 0, if eax is >= 0.
Now, what is taking the abs() of a value? Right, it is a two's
complement, if the argument is negative, or a do-nothing if
it is not negative.
And the name 'two's complement' comes from the fact
that you need to do two operations for the complement:
First, take the NOT of the value, then add one.
So, here comes the trick:
NOT is equivalent to XORing with -1.
ADD 1 is the same as SUB -1.
Fortunately, this can all be carried out with the result of
the CDQ sign extension, since we got the -1 from it:
Do the CDQ.
Load eax into another register
XOR it with edx (which is either 0 or -1)
SUB edx from it (which is either adding one or zero)
So, here the solution(TM):
int other_abs(int a) {
__asm {
mov eax, a;
/* from here... */
cdq;
mov ebx, eax;
xor ebx, edx;
sub ebx, edx;
/*... to here*/
xchg eax, ebx;
}
}
The meat of the program is exactly four insns, the rest
is just needed to make the C calling conventions happy.
Here the full program, you can step through it with a
debugger, to see that it really works:
int int_abs(int a) {
return a - ((a+a) & (a>>31));
}
int other_abs(int a) {
__asm {
mov eax, a; /* C: load arg */
/* from here... */
cdq; /* 1: edx has either 0 or -1 */
mov ebx, eax; /* 2: get a copy */
xor ebx, edx; /* 3: either identity, or NOT */
sub ebx, edx; /* 4: either add 0, or 1 */
/* ... to here */
xchg eax, ebx; /* C: store result */
}
}
int main(int argc, char **argv) {
int a;
a = int_abs(5);
a = int_abs(-5);
a = other_abs(5);
a = other_abs(-5);
return 0;
}
Hoping-that-everybody-trusts-me-now-that-I-can-write-a-small-OS
-- ly y'rs - chris
--
Christian Tismer :^) <mailto:tismer at tismer.com>
Mission Impossible 5oftware : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/
More information about the Pypy-dev
mailing list