[Neuroimaging] parallel computation of bundle_distances_mam/mdf ?

Paolo Avesani avesani at fbk.eu
Wed Dec 14 12:21:32 EST 2016


Just for reference I tried two ways on Mac OS 10.11.5:
1. compile with clang
2.  compile with gcc provided by anaconda
In both cases compilation failed.

1. compile with clang
----------------------------
$ python setup.py build_ext --inplace
Compiling test.pyx because it changed.
Cythonizing test.pyx
running build_ext
building 'test' extension
creating build
creating build/temp.macosx-10.6-x86_64-2.7
gcc -fno-strict-aliasing -I/Users/paolo/Software/miniconda/include -arch
x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
-I/Users/paolo/Software/miniconda/include/python2.7 -c test.c -o
build/temp.macosx-10.6-x86_64-2.7/test.o -fopenmp
clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1


2.  compile with gcc provided by anaconda
---------------------------------------------------------
$ python setup.py build_ext --inplace
Compiling test.pyx because it changed.
Cythonizing test.pyx
running build_ext
building 'test' extension
gcc -fno-strict-aliasing -I/Users/paolo/Software/miniconda/include -arch
x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
-I/Users/paolo/Software/miniconda/include/python2.7 -c test.c -o
build/temp.macosx-10.6-x86_64-2.7/test.o -fopenmp
In file included from
/Users/paolo/Software/miniconda/lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/include-fixed/syslimits.h:7:0,
                 from
/Users/paolo/Software/miniconda/lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/include-fixed/limits.h:34,
                 from
/Users/paolo/Software/miniconda/include/python2.7/Python.h:19,
                 from test.c:16:
/Users/paolo/Software/miniconda/lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/include-fixed/limits.h:168:61:
fatal error: limits.h: No such file or directory
 #include_next <limits.h>  /* recurse down to the real one */
compilation terminated.
error: command 'gcc' failed with exit status 1



On Wed, Dec 14, 2016 at 4:51 PM, Eleftherios Garyfallidis <elef at indiana.edu>
wrote:

>
> Hi Emanuele,
>
> My understanding is that openmp was only temporarily not available when
> clang replaced gcc in osx.
>
> So, I would suggest to go ahead with openmp. Any current installation
> issues are only temporarily for osx.
> Openmp gives us a lot of capability to play with shared memory and it is a
> standard that will be around
> for very long time. Also, the great integration in cython makes the
> algorithms really easy to read.
> So, especially for this project my recommendation is to use openmp rather
> than multiprocessing. All the way! :)
>
> I am CC'ing Stephan who wrote the instructions for osx. I am sure he can
> help you with this. I would also suggest
> to check if xcode provides any new guis for enabling openmp. I remember
> there was something for that.
>
> Laterz!
> Eleftherios
>
>
>
>
> On Wed, Dec 14, 2016 at 6:29 AM Emanuele Olivetti <olivetti at fbk.eu> wrote:
>
>> Hi Eleftherios,
>>
>> Thank you for pointing me to the MDF example. From what I see the Cython
>> syntax is not complex, which is good.
>>
>> My only concern is the availability of OpenMP in the systems where DiPy
>> is used. On a reasonably recent GNU/Linux machine it seems straightforward
>> to have libgomp and the proper version of gcc. On other systems - say OSX -
>> the situation is less clear to me. According to what I read here
>>   http://nipy.org/dipy/installation.html#openmp-with-osx
>> the OSX installation steps are not meant for standard end users. Are
>> those instructions updated?
>> As a test of that, we've just tried to skip the steps described above and
>> instead to install gcc with conda on OSX ("conda install gcc"). In the
>> process, conda installed the recent gcc-4.8 with libgomp, which seems good
>> news. Unfortunately, when we tried to compile a simple example of Cython
>> code using parallelization (see below), the process failed (fatal error:
>> limits.h : no such file or directory)...
>>
>> For the reasons above, I am wondering whether the very simple solution of
>> using the "multiprocessing" module, available from the standard Python
>> library, may be an acceptable first step towards the more efficient
>> multithreading of Cython/libgomp. With "multiprocessing", there is no extra
>> dependency on libgomp, or recent gcc or else. Moreover, multiprocessing
>> does not require to have Cython code, because it works on plain Python too.
>>
>> Best,
>>
>> Emanuele
>>
>> ---- test.pyx ----
>> from cython import parallel
>> from libc.stdio cimport printf
>>
>> def test_func():
>>     cdef int thread_id = -1
>>     with nogil, parallel.parallel(num_threads=10):
>>         thread_id = parallel.threadid()
>>         printf("Thread ID: %d\n", thread_id)
>> -----
>>
>> ----- setup.py -----
>> from distutils.core import setup, Extension
>> from Cython.Build import cythonize
>>
>> extensions = [Extension(
>>                 "test",
>>                 sources=["test.pyx"],
>>                 extra_compile_args=["-fopenmp"],
>>                 extra_link_args=["-fopenmp"]
>>             )]
>>
>> setup(
>>     ext_modules = cythonize(extensions)
>> )
>> ----
>> python setup.py build_ext --inplace
>>
>> On Tue, Dec 13, 2016 at 11:17 PM, Eleftherios Garyfallidis <
>> elef at indiana.edu> wrote:
>>
>> Hi Emanuele,
>>
>> Here is an example of how we calculated the distance matrix in parallel
>> (for the MDF) using OpenMP
>> https://github.com/nipy/dipy/blob/master/dipy/align/bundlemin.pyx
>>
>> You can just add another function that does the same using mam. It should
>> be really easy to implement as we have
>> already done it for the MDF for speeding up SLR.
>>
>> Then we need to update the bundle_distances* functions to use the
>> parallel versions.
>>
>> I'll be happy to help you with this. Let's try to schedule some time to
>> look at this together.
>>
>> Best regards,
>> Eleftherios
>>
>>
>> On Mon, Dec 12, 2016 at 11:16 AM Emanuele Olivetti <olivetti at fbk.eu>
>> wrote:
>>
>> Hi,
>>
>> I usually compute the distance matrix between two lists of streamlines
>> using bundle_distances_mam() or bundle_distances_mdf(). When the lists are
>> large, it is convenient and easy to exploit the multiple cores of the CPU
>> because such computation is intrinsically (embarassingly) parallel. At the
>> moment I'm doing it through the multiprocessing or the joblib modules,
>> because I cannot find a way directly from DiPy, at least according to what
>> I see in dipy/tracking/distances.pyx . But consider that I am not
>> proficient in cython.parallel.
>>
>> Is there a preferable way to perform such parallel computation? I plan to
>> prepare a pull request in future and I'd like to be on the right track.
>>
>> Best,
>>
>> Emanuele
>>
>> _______________________________________________
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>>
>> _______________________________________________
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>>
>> _______________________________________________
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>
> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging
>
>


-- 
-------------------------------------------------------
Paolo Avesani
Fondazione Bruno Kessler
via Sommarive 18,
38050 Povo (TN) - I
phone:   +39 0461 314336
fax:        +39 0461 302040
email:     avesani at fbk.eu
web:       avesani.fbk.eu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20161214/6daba20f/attachment.html>


More information about the Neuroimaging mailing list