[Numpy-discussion] parallel compilation of numpy

Thu Feb 19 01:43:40 EST 2009

David Cournapeau wrote:
> Michael Abshoff wrote:

Hi David,

>> Sure, it also works for incremental builds and I do that many, many 
>> times a day, i.e. for each patch I merge into the Sage library. What 
>> gets recompiled is decided by our own dependency tracking code which we 
>> want to push into Cython itself. Figuring out dependencies on the fly 
>> without caching takes about 1s for the whole Sage library which includes 
>> parsing every Cython file.
>>   
> 
> Hm, I think I would have to look at what sage is internally to really
> understand the implications. But surely, if you can figure out in one
> second the whole dependency for scipy, I would be more than impressed:
> you would beat waf and make at their own game.

I didn't write the code, but thanks :)

We used to cache the dependency tree by pickling it and if no time 
stamps changed we would reuse it, so that phase would be instant. Alas, 
there was one unfixed bug in it that hit you if you removed a extension 
or file. But the main author of that code (Craig Citro - to give credit 
where credit is due) has an idea how to fix it and once his PhD thesis 
is handed in will stomp that bug out.

>> We used to use threads for the "parallel stuff" and it is indeed racy, 
>> but that was mostly observed when running doctests since we only had one 
>> current directory. All those problems went away once we started to use 
>> Pyprocessing and while there is some overhead for the forks it is 
>> drowned out by the build time when using 2 cors.
>>   
> 
> Does pyprocessing work well on windows as well ? I have 0 experience
> with it.

It should, but I haven't tested it. The last official, stand alone 
pyprocessing we have in Sage causes trouble on FreeBSD 7 by segfaulting, 
so we will likely update to the backport from Python 2.6 soon since for 
now we are stuck at Python 2.5 until numpy/scipy and a bunch of other 
Python projects like NetworkX support it officially ;).

>> Ouch. Is that without the dependencies, i.e. ATLAS?
>>   
> 
> Yes - but I need to build scipy three times, for each ATLAS (if I could
> use numscons, it would be much better, since a library change is handled
> as a dependency in scons; with distutils, the only safe way is to
> rebuild from scratch for every configuration).

In Sage if we link static libs into extension we add a dependency to the 
header. But you could do the same for libatlas.a, so dropping in a new 
version and touching it should just rebuild the extensions depending on 
atlas. I have tested this extensively and not found any problem with 
that approach.

>> I was curious how you build the various version of ATLAS, i.e. no SSE, 
>> SSE, SSE2, etc. Do you just set the arch via -A and build them all on 
>> the same box? [sorry for getting slightly OT here :)]
>>   
> 
> This does not work: ATLAS will still use SSE if your CPU supports it,
> even if you force an arch without SSE. I tried two different things:
> first, using a patched qemu with options to emulate a P4 wo SSE, with
> SSE2 and with SSE3, but this does not work so well (the generated
> versions are too slow, and handling virtual machines in qemu is a bit of
> a pain). Now, I just build on different machines, and hope I won't need
> to rebuild them too often.

Ok, I now remember what I did about this problem two days ago since I 
want to build SSE2 only binary releases of Sage. Apparently there are 
people out there who aren't using Intel/AMD CPUs with SSE3 :)

In order to make ATLAS build without say SSE3 go into the config system 
and have the SSE3 probe return "FAILURE" unconditionally. That way ATLAS 
will only pick SSE2 even if the CPU handles more. I verified by using 
objdump that the resulting lib does not contain any PNI (==SSE3) 
instructions any more, see

  http://trac.sagemath.org/sage_trac/ticket/5219

A problem here is that you get odd arches without tuning info, i.e. 
P464SSE2, so one has to build tuning info one and drop it into 
subsequent builds of ATLAS. You will also have the problem of Hammer vs. 
P4 ATLAS kernels, so one day I will measure performance.

I meant to ask Clint about adding a configure switch for a maximum SSE 
level to ATLAS itself, but since I only got the problem solved two days 
ago I hadn't gotten around to it. Given that everything else is 
configurable it seems like he would welcome it.

If you want more details on where to poke around in the config system 
let me know.

> cheers,
> 
> David

Cheers,

Michael

> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>