
Tim Peters wrote:
[Damien Morton]
In the BINARY_ADD opcode, and in most arithmetic opcodes,
Aren't add and subtract the whole story here?
there is a line that checks for overflow that looks like this:
if ((i^a) < 0 && (i^b) < 0) goto slow_add;
I got a small speedup by replacing this with a macro defined thusly:
#if defined(_MSC_VER) and defined(_M_IX86)
"and" isn't C, so I assume you were very lucky <wink>.
#define IF_OVERFLOW_GOTO(X) __asm { jo X }; #else #define IF_OVERFLOW_GOTO(X) if ((i^a) < 0 && (i^b) < 0) goto X; #endif
Would this case be an acceptable use of snippets of inline assembler?
If you had said "a huge speedup, on all programs", on the weak end of maybe. "Small speedup" isn't worth the obscurity. Note that Python contains no assembler now.
Just to add my 0.02 EUR. You know that I'm not reluctant to use assembly for platform specific speedups. But first, I'm with Tim, not going this path for such a small win. Second, I'd like to point out that going to assembly for such a huge function like eval_frame is rather dangerous: All compilers have different ways of handling the appearance of assembly. This is a dangerous path, believe me: MS C's behavior is one of the worst, which is the reason why I was very careful to put this in a clean-room for Stackless, for instance: For the appearance of ASM code in some function, the calling sequence and the optimization strategy are changed drastically. Register allocation is changed, the optimization level is reduced, and the calling convention is *never* without stack frames. This might not have changed eval_frame's behavior too much, just because it is too big to benefit from certain optimizations now, but I remember that I changed it once to use about two registers less, and I might re-apply these changes to give the eval loop a boost of about 10 percent. The existance of a single one asm statement would voiden this effect! Hint: Write a small, understandable function twice, once using assembly and once without. Compile the stuff, and set the listing option to everything. Then look at the .cod file, and wonder how different the two versions are. This will make you very reluctant to use any asm statement at all, unless you want to re-write the whole function in assembly, including the "naked" option. Doing the latter for eval_frame would be worthwhile, but then I'd suggest to do this as an external .asm file. If you do this right, taking cache lines and probabilities into account, you can for sure create an overall gain of up to 20 percent. But even this remarkable gain wouldn't be enough, even for me, to go this hard path for a single platform. sincerely -- chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/