further optimising the micro-optimisations for cache locality (fwd)

FYI. It would be interesting to collect some feedback on these ideas for popular combos. Who knows... Forwarded message:
-- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

On Fri, Jul 28, 2000 at 09:24:36PM +0200, Vladimir Marangozov wrote:
It would be interesting to collect some feedback on these ideas for popular combos. Who knows...
I like these ideas, though I think anything beyond 'further folded' requires a seperate switch for the non-common operators and those that do a tad more than call a function with a certain number of arguments and push the result on the stack. Re-numbering the ops into fast-ops and slow-ops, as well as argument-ops and nonargument-ops. (I hope all non-argument ops fall in the 'fast' category, or it might get tricky ;-P) I'm also wondering whether they really speed things up. The confusion might force the compiler to generate *less* efficient code. Then again, it removes some of the burden from the compiler, too, so it probably depends very heavily on the compiler whether this is going to have a positive effect.
I'm not so sure about any of these comments, given that we do jump to a function right after accessing these tables. I suggest heavy testing, and I can offer only two architectures myself (linux-i386 and solaris-sparc-gcc.) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Thomas Wouters wrote:
I agree. Note that all we can get though this game is some dubious percents that we shouldn't ignore, but which persistency in the long term is quite questionable. There's no way that, to pick a guy at random, Tim won't confirm this! And just like Guido, I'm not ready to trade code cleanliness for dubious speed. However, I take the opportunity to invite you again to "heavy test" the object allocator's candidate -- obmalloc.c, which, in my educated understanding of the current C implementation is the only piece of code that is capable of increasing Python's overall performance by > 10% assuming that your script involves object allocations. It is my strong belief that this perfomance comes for free, by instrumenting Python at its most inner internals, not on top of it! Of course, improvements to this code are welcome and I'd be happy to discuss them in this forum. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

On Fri, Jul 28, 2000 at 09:24:36PM +0200, Vladimir Marangozov wrote:
It would be interesting to collect some feedback on these ideas for popular combos. Who knows...
I like these ideas, though I think anything beyond 'further folded' requires a seperate switch for the non-common operators and those that do a tad more than call a function with a certain number of arguments and push the result on the stack. Re-numbering the ops into fast-ops and slow-ops, as well as argument-ops and nonargument-ops. (I hope all non-argument ops fall in the 'fast' category, or it might get tricky ;-P) I'm also wondering whether they really speed things up. The confusion might force the compiler to generate *less* efficient code. Then again, it removes some of the burden from the compiler, too, so it probably depends very heavily on the compiler whether this is going to have a positive effect.
I'm not so sure about any of these comments, given that we do jump to a function right after accessing these tables. I suggest heavy testing, and I can offer only two architectures myself (linux-i386 and solaris-sparc-gcc.) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Thomas Wouters wrote:
I agree. Note that all we can get though this game is some dubious percents that we shouldn't ignore, but which persistency in the long term is quite questionable. There's no way that, to pick a guy at random, Tim won't confirm this! And just like Guido, I'm not ready to trade code cleanliness for dubious speed. However, I take the opportunity to invite you again to "heavy test" the object allocator's candidate -- obmalloc.c, which, in my educated understanding of the current C implementation is the only piece of code that is capable of increasing Python's overall performance by > 10% assuming that your script involves object allocations. It is my strong belief that this perfomance comes for free, by instrumenting Python at its most inner internals, not on top of it! Of course, improvements to this code are welcome and I'd be happy to discuss them in this forum. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
participants (2)
-
Thomas Wouters
-
Vladimir.Marangozov@inrialpes.fr