donn at u.washington.edu
Wed Nov 30 23:19:41 CET 2005
In article <86sltdn6ma.fsf at bhuda.mired.org>, Mike Meyer <mwm at mired.org>
> "Harald Armin Massa" <haraldarminmassa at gmail.com> writes:
> >>Faster than assembly? LOL... :)
> > why not? Of course, a simple script like "copy 200 bytes from left to
> > right" can be handoptimized in assembler and run at optimum speed.
> > Maybe there is even a special processor command to do that.
> Chances are, version 1 of the system doesn't have the command. Version
> 2 does, but it's no better than the obvious hand-coded loop. Version 3
> finally makes it faster than the hand-coded loop, if you assume you
> have the instruction. If you have to test to see if you can use it,
> the hand-coded version is equally fast. Version 4 makes it faster even
> if you do the test, so you want to use it if you can. Of course, by
> then there'll be a *different* command that can do the same thing,j and
> is faster in some conditions.
> Dealing with this in assembler is a PITA. If you're generating code on
> the fly, you generate the correct version for the CPU you're running
> on, and that's that. It'll run at least as fast as hand-coded
> assembler on every CPU, and faster on some.
Actually I think the post you quote went on to make a similar
I read yesterday morning in the paper that the Goto Basic Linear
Algebra Subroutines, by a Mr. Kazushige Goto, are still the most
efficient library of functions for their purpose for use in
supercomputing applications. Apparently hand-optimized assembler
for specific processors.
(actually from the NY Times, apparently)
Donn Cave, donn at u.washington.edu
More information about the Python-list