OT: abs(x) with 4 assembly insns
![](https://secure.gravatar.com/avatar/e563240a5cb276b1eed4ebe525cf6470.jpg?s=120&d=mm&r=g)
Hi friends, today, Armin presented me a simple brain-teaser: You have X86 assembly, you have only 4 insns, and you don't want to use a jump. You have a register, loaded with a value, and you should produce its abs, in another register, while preserving the argument register. Hmm. 4 insns. Here is the standard C solution for doing abs without a jump, whioch you can find on the internet: int int_abs(int a) { return a - ((a+a) & (a>>31)); } This clearly computes the bas, quite nicely. The resulting assembly code looks like so: ; 2 : return a - ((a+a) & (a>>31)); 00000 8b 44 24 04 mov eax, DWORD PTR _a$[esp-4] 00004 8b c8 mov ecx, eax 00006 c1 f9 1f sar ecx, 31 ; 0000001fH 00009 8d 14 00 lea edx, DWORD PTR [eax+eax] 0000c 23 ca and ecx, edx 0000e 2b c1 sub eax, ecx ; 3 : } This is clearly some solution. But, ignoring the first insn, which is just moving the argument, this really takes five instructions, one too much of course. Actually, there is a much better solution. X86 has the cdq insn, which does a sign extension from 32 bit signed eax to 64 bit signed edx/eax. This in fact puts a -1 into edx, whenever the eax is negative, or a 0, if eax is >= 0. Now, what is taking the abs() of a value? Right, it is a two's complement, if the argument is negative, or a do-nothing if it is not negative. And the name 'two's complement' comes from the fact that you need to do two operations for the complement: First, take the NOT of the value, then add one. So, here comes the trick: NOT is equivalent to XORing with -1. ADD 1 is the same as SUB -1. Fortunately, this can all be carried out with the result of the CDQ sign extension, since we got the -1 from it: Do the CDQ. Load eax into another register XOR it with edx (which is either 0 or -1) SUB edx from it (which is either adding one or zero) So, here the solution(TM): int other_abs(int a) { __asm { mov eax, a; /* from here... */ cdq; mov ebx, eax; xor ebx, edx; sub ebx, edx; /*... to here*/ xchg eax, ebx; } } The meat of the program is exactly four insns, the rest is just needed to make the C calling conventions happy. Here the full program, you can step through it with a debugger, to see that it really works: int int_abs(int a) { return a - ((a+a) & (a>>31)); } int other_abs(int a) { __asm { mov eax, a; /* C: load arg */ /* from here... */ cdq; /* 1: edx has either 0 or -1 */ mov ebx, eax; /* 2: get a copy */ xor ebx, edx; /* 3: either identity, or NOT */ sub ebx, edx; /* 4: either add 0, or 1 */ /* ... to here */ xchg eax, ebx; /* C: store result */ } } int main(int argc, char **argv) { int a; a = int_abs(5); a = int_abs(-5); a = other_abs(5); a = other_abs(-5); return 0; } Hoping-that-everybody-trusts-me-now-that-I-can-write-a-small-OS -- ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
![](https://secure.gravatar.com/avatar/b932b1e5a3e8299878e579f51f49b84a.jpg?s=120&d=mm&r=g)
On Sunday, Sep 28, 2003, at 20:29 America/New_York, Christian Tismer wrote:
Hi friends,
today, Armin presented me a simple brain-teaser:
You have X86 assembly, you have only 4 insns, and you don't want to use a jump. You have a register, loaded with a value, and you should produce its abs, in another register, while preserving the argument register.
Hmm. 4 insns.
If you are using a PowerPC with Altivec, you can get the absolute value of four 32bit integers in three instructions from C using vector signed int vec_abs(vector signed int a), which is a special compiler macro that turns into: # v1 is argument vector, v0 is result vector vspltisw v0,0 # v0 = [0] * 4 vsubuwm v0,v0,v1 # v0 = map(int.__sub__, v0, v1) vmaxsw v0,v1,v0 # v0 = map(max, v1, v0) With conventional PowerPC instructions (also three instructions): # r0 is argument register, r1 is result, r2 is scratch srawi r1,r0,31 # see below, python can't do this nicely add r2,r1,r0 # r2 = r1 + r0 xor r1,r2,r1 # r1 = r2 ^ r1 Here's an Python example of the second, mainly to demonstrate how srawi works (working in unsigned longs because python 2.3 is inconvenient about bit twiddling signed integers): def binary(v): return ''.join([(v & (1L << i)) and '1' or '0' for i in range(32)[::-1]])+'b' def srawi(v, num): if v & 0x80000000L: mask = 0xFFFFFFFFL ^ ((1L << (32L - num)) - 1) else: mask = 0x00000000L return (v >> num) | mask def new_abs(r0): if r0 < 0: r0 = 0x100000000L + r0 print 'input is %s' % (binary(r0),) r1 = srawi(r0, 31) print 'r1 = %s' % (binary(r1),) r2 = r1 + r0 print 'r2 = %s' % (binary(r2),) r1 = (r2 ^ r1) & 0xFFFFFFFFL print 'r1 = %s' % (binary(r1),) return r1
new_abs(1000) input is 00000000000000000000001111101000b r1 = 00000000000000000000000000000000b r2 = 00000000000000000000001111101000b r1 = 00000000000000000000001111101000b 1000L new_abs(-1000) input is 11111111111111111111110000011000b r1 = 11111111111111111111111111111111b r2 = 11111111111111111111110000010111b r1 = 00000000000000000000001111101000b 1000L
-bob
![](https://secure.gravatar.com/avatar/1efc90ff6075b7654d8a8ce6e51a2cd3.jpg?s=120&d=mm&r=g)
Christian Tismer <tismer@tismer.com> writes:
Hi friends,
today, Armin presented me a simple brain-teaser:
You have X86 assembly, you have only 4 insns, and you don't want to use a jump. You have a register, loaded with a value, and you should produce its abs, in another register, while preserving the argument register.
Hmm. 4 insns.
Or you can cheat, and remember that you saw something like this in some psyco header file somewhere... /* as you can check the following takes the absolute value of (say) EAX: \ ADD EAX, EAX \ SBB EAX, sourcecopy \ SBB EDX, EDX \ XOR EAX, EDX \ (note: although the idea is not original, the above code might be \ original as it has been found by an exhaustive search on *all* \ short codes :-) \ */ \ I don't know x86 assembler, so I have no idea which of the two code sequences are actually likely to run quicker on a modern processor. I've been reading the PPC architecture manual, though, so I can tell you that Bob's 3 instructions are all single cycle instructions on a G3... Cheers, mwh -- There are two kinds of large software systems: those that evolved from small systems and those that don't work. -- Seen on slashdot.org, then quoted by amk
![](https://secure.gravatar.com/avatar/b932b1e5a3e8299878e579f51f49b84a.jpg?s=120&d=mm&r=g)
On Monday, Sep 29, 2003, at 07:15 America/New_York, Michael Hudson wrote:
Christian Tismer <tismer@tismer.com> writes:
Hi friends,
today, Armin presented me a simple brain-teaser:
You have X86 assembly, you have only 4 insns, and you don't want to use a jump. You have a register, loaded with a value, and you should produce its abs, in another register, while preserving the argument register.
Hmm. 4 insns.
Or you can cheat, and remember that you saw something like this in some psyco header file somewhere...
/* as you can check the following takes the absolute value of (say) EAX: \ ADD EAX, EAX \ SBB EAX, sourcecopy \ SBB EDX, EDX \ XOR EAX, EDX \ (note: although the idea is not original, the above code might be \ original as it has been found by an exhaustive search on *all* \ short codes :-) \ */ \
I don't know x86 assembler, so I have no idea which of the two code sequences are actually likely to run quicker on a modern processor.
I've been reading the PPC architecture manual, though, so I can tell you that Bob's 3 instructions are all single cycle instructions on a G3...
You should also check out IBM's PowerPC Compiler Writer's Guide, if you haven't already: http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/ 852569B20050FF7785256996007558C6 Hard copies of this document can be obtained by calling IBM at (800) 879-2755 and requesting document number SA14-2094, I hear that they used to (maybe still do) charge $1.00 for the book :) -bob
![](https://secure.gravatar.com/avatar/1efc90ff6075b7654d8a8ce6e51a2cd3.jpg?s=120&d=mm&r=g)
Bob Ippolito <bob@redivi.com> writes:
I've been reading the PPC architecture manual, though, so I can tell you that Bob's 3 instructions are all single cycle instructions on a G3...
You should also check out IBM's PowerPC Compiler Writer's Guide, if you haven't already: http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/ 852569B20050FF7785256996007558C6
Hard copies of this document can be obtained by calling IBM at (800) 879-2755 and requesting document number SA14-2094, I hear that they used to (maybe still do) charge $1.00 for the book :)
Our Apologies You have attempted to enter a controlled access area that you are not currently entitled to view. IBM Microelectronics uses a entitlement process to protect potentially sensitive data. ... oh well. I'll dig around for it, though. And I guess s&h might be more that $1 to the UK... Cheers, mwh -- if-you-need-your-own-xxx.py-you-know-where-to-shove-it<wink>-ly y'rs - tim -- Tim Peters dishes out versioning advice on python-dev
![](https://secure.gravatar.com/avatar/b932b1e5a3e8299878e579f51f49b84a.jpg?s=120&d=mm&r=g)
On Monday, Sep 29, 2003, at 08:23 America/New_York, Michael Hudson wrote:
Bob Ippolito <bob@redivi.com> writes:
I've been reading the PPC architecture manual, though, so I can tell you that Bob's 3 instructions are all single cycle instructions on a G3...
You should also check out IBM's PowerPC Compiler Writer's Guide, if you haven't already: http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/ 852569B20050FF7785256996007558C6
Hard copies of this document can be obtained by calling IBM at (800) 879-2755 and requesting document number SA14-2094, I hear that they used to (maybe still do) charge $1.00 for the book :)
Our Apologies
You have attempted to enter a controlled access area that you are not currently entitled to view. IBM Microelectronics uses a entitlement process to protect potentially sensitive data.
... oh well. I'll dig around for it, though. And I guess s&h might be more that $1 to the UK...
Maybe it's restricted to US citizens only? That would be odd.. I tried it in another browser just to make sure it wasn't a baked cookie from when I downloaded IBM's PowerPC compilers, and it worked just fine. As for the 800 number not working outside of the US.. I have no idea, I don't work at IBM. I don't think I can mirror the pdf publically on an archived mailing list according to the copyright of the book, but you all know what my email address is if you are interested in it ;) -bob
![](https://secure.gravatar.com/avatar/1efc90ff6075b7654d8a8ce6e51a2cd3.jpg?s=120&d=mm&r=g)
Bob Ippolito <bob@redivi.com> writes:
On Monday, Sep 29, 2003, at 08:23 America/New_York, Michael Hudson wrote:
Bob Ippolito <bob@redivi.com> writes:
I've been reading the PPC architecture manual, though, so I can tell you that Bob's 3 instructions are all single cycle instructions on a G3...
You should also check out IBM's PowerPC Compiler Writer's Guide, if you haven't already: http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/ 852569B20050FF7785256996007558C6
Hard copies of this document can be obtained by calling IBM at (800) 879-2755 and requesting document number SA14-2094, I hear that they used to (maybe still do) charge $1.00 for the book :)
Our Apologies
You have attempted to enter a controlled access area that you are not currently entitled to view. IBM Microelectronics uses a entitlement process to protect potentially sensitive data.
... oh well. I'll dig around for it, though. And I guess s&h might be more that $1 to the UK...
Maybe it's restricted to US citizens only? That would be odd.. I tried it in another browser just to make sure it wasn't a baked cookie from when I downloaded IBM's PowerPC compilers, and it worked just fine.
Using the site search got me the PDF in the end. Maybe it checks Referrer headers? It's not like I care now... Goody, more reading. Cheers, mwh -- Of the four project development variables - scope, cost, time and quality - quality isn't really a free variable. The only possible values are "excellent" and "insanely excellent", depending on whether lives are at stake. -- Kent Beck, XP Explained
![](https://secure.gravatar.com/avatar/6ad1811de8100dc2f561d597023c8542.jpg?s=120&d=mm&r=g)
http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF77 85256996007558C6/$file/cwg.pdf seems to work. -- Robin Becker
![](https://secure.gravatar.com/avatar/e563240a5cb276b1eed4ebe525cf6470.jpg?s=120&d=mm&r=g)
Michael Hudson wrote:
Christian Tismer <tismer@tismer.com> writes: ...
Hmm. 4 insns.
Or you can cheat, and remember that you saw something like this in some psyco header file somewhere...
No, I couldn't. Armin was referring to it, and I wanted to find out by myself. He told me about the exhaustive search, which I found funny.
/* as you can check the following takes the absolute value of (say) EAX: \ ADD EAX, EAX \ SBB EAX, sourcecopy \ SBB EDX, EDX \ XOR EAX, EDX \ (note: although the idea is not original, the above code might be \ original as it has been found by an exhaustive search on *all* \ short codes :-) \ */ \
Actually, the above is not correct since it needs 5 insns. Armin didn't count the sourcecopy. My solution really has four insns, but it is cheating since it *needs* the register eax.
I don't know x86 assembler, so I have no idea which of the two code sequences are actually likely to run quicker on a modern processor.
I guess they are almost the same speed. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
![](https://secure.gravatar.com/avatar/b932b1e5a3e8299878e579f51f49b84a.jpg?s=120&d=mm&r=g)
On Monday, Sep 29, 2003, at 09:13 America/New_York, Christian Tismer wrote:
Michael Hudson wrote:
Christian Tismer <tismer@tismer.com> writes: ...
Hmm. 4 insns. Or you can cheat, and remember that you saw something like this in some psyco header file somewhere...
No, I couldn't. Armin was referring to it, and I wanted to find out by myself. He told me about the exhaustive search, which I found funny.
/* as you can check the following takes the absolute value of (say) EAX: \ ADD EAX, EAX \ SBB EAX, sourcecopy \ SBB EDX, EDX \ XOR EAX, EDX \ (note: although the idea is not original, the above code might be \ original as it has been found by an exhaustive search on *all* \ short codes :-) \ */ \
Actually, the above is not correct since it needs 5 insns. Armin didn't count the sourcecopy.
My solution really has four insns, but it is cheating since it *needs* the register eax.
I don't know x86 assembler, so I have no idea which of the two code sequences are actually likely to run quicker on a modern processor.
I guess they are almost the same speed.
Ah, the joys of x86. It's amazing that so many people still use such a braindead ISA :) -bob
participants (4)
-
Bob Ippolito
-
Christian Tismer
-
Michael Hudson
-
Robin Becker