I just browsed over the archives for Numpy-discussion and saw this, and decided to sign up.
I work on the ATLAS-project http://math-atlas.sourceforge.net/ and we have had similar problems with gcc3.0 Gcc 3.0 has a completely new backend which produces much slower floating point code on i386 machines. It is most visible on the Athlon, but it also also shows on on P4 and PIII machines. We havn't yet figured out if there are some optimizations that can make this go away, but if you need performance stick with the old 2.95 release for now.
By the way, if you would like to use Atlas in NumPy (I don't know if you do it already) I might be of some help. There is c-interfaces to the BLAS bundled with ATLAS, supporting both row-major and column-major storage.