[Neuroimaging] parallel computation of bundle_distances_mam/mdf ?

Ariel Rokem arokem at gmail.com
Thu Dec 15 10:09:32 EST 2016


Emanule,

On Thu, Dec 15, 2016 at 12:41 AM, Emanuele Olivetti <olivetti at fbk.eu> wrote:

> Thank you to all of you for the many details that help to clear up the
> picture.
>
> It is my understanding that even the last version of OSX, i.e. v10.12.*,
> does not support libomp within the Apple's Xcode development framework. At
> the same time, future version should support it, at some point. Honestly,
> this does not seem an ideal scenario, because not many OSX users chase the
> latest versions of the opeating system, while everyone has multiple cores
> since many years. For the same reason, I doubt that contacting the Xcode
> side may bring us closer to the solution.
> To Ariel: if you want to test that OpemMP works, once you build test.so,
> just import it in the Python console and execute test.test_func(). You
> should get this:
>

Was there supposed to be a file attached?

Thanks!

Ariel


> ---
> >>> import test
> >>> test.test_func()
> Thread ID: 2
> Thread ID: 8
> Thread ID: 7
> Thread ID: 3
> Thread ID: 4
> Thread ID: 5
> Thread ID: 6
> Thread ID: 0
> Thread ID: 1
> Thread ID: 9
> ---
> If you do not enable the OpenMP support (just comment out the two related
> lines in setup.py and re-build), the output will differ and only one thread
> will show up:
> ---
> >>> import test
> >>> test.test_func()
> Thread ID: 0
> ---
>
> Anyway, what surprises me most is the failure of Anaconda's gcc+libgomp on
> OSX. In principle that toolchain is exactly the same as on GNU/Linux, so I
> see no reason for it fail. I'll try to dig more into this one.
>
> To Eleftherios: I agree with you that OpenMP is the best technical
> solution, because the use of threads - which shares memory - is definitely
> more preferable than multiprocessing - which replicates data in each
> process. Anyway, the limitations I see are still there:
> 1) OpenMP support seems problematic on non-GNU/Linux systems. By the way,
> what is the situation on Windows? Here we don't have such system so we
> couldn't try.
> 2) OpemMP works only with Cython code. This means that no Python part of
> DiPy can use parallel execution. This seems a bit too excessive to me,
> because I expect that not all parts of DiPy will be re-written in Cython,
> in the foreseeable future.
>
> So, again, I advocate for some use of the standard multiprocessing module
> :) as an temporary step when OpenMP cannot be used.
>
> Best,
>
> Emanuele
>
>
> On Wed, Dec 14, 2016 at 10:04 PM, Stephan Meesters <
> stephan.meesters at gmail.com> wrote:
>
>> Hi,
>>
>> The instructions at http://nipy.org/dipy/instal
>> lation.html#openmp-with-osx are outdated since the clang-omp formula
>> does not exist anymore.
>>
>> With the release of Clang 3.8.0 (08 Mar 2016), OpenMP 3.1 support is
>> enabled in Clang by default. You will need the -fopenmp=libomp flag while
>> building.
>>
>> I started a while back on a DIPY homebrew formula to allow installation
>> via "brew install dipy", see https://github.com/Homebrew/ho
>> mebrew-python/pull/310. However it was based around the deprecated
>> clang-omp and I didn't get around to fix it. Might look into it soon if I
>> find some spare time to tweak the formula.
>>
>> Regards,
>> Stephan
>>
>> 2016-12-14 17:14 GMT+01:00 Samuel St-Jean <stjeansam at gmail.com>:
>>
>>> That also depends on which version of clang ships by default with osx,
>>> in which case you have to play around with it to get a new one. I think it
>>> starts with clang 3.7 to have openmp support (I only ever tried Mac osx
>>> 10.9, so anyone more experienced can chime in), but everything older has to
>>> go through the hombebrew gcc install and company. Might be worhtwhile to
>>> check if openmp support is out of the box now, and starting on which mac
>>> osx versions, since older ones could be problematic for first time
>>> installs.
>>>
>>> I also have to admit I have no idea how old is old in the mac world, so
>>> maybe 10.9 is already phased out by now, but it was a hard and time
>>> consuming building around stuff with homebrew experience (and again, first
>>> time I used a mac, so, I guess the average user would also have some
>>> issues).
>>>
>>> Samuel
>>>
>>> 2016-12-14 16:51 GMT+01:00 Eleftherios Garyfallidis <elef at indiana.edu>:
>>>
>>>>
>>>> Hi Emanuele,
>>>>
>>>> My understanding is that openmp was only temporarily not available when
>>>> clang replaced gcc in osx.
>>>>
>>>> So, I would suggest to go ahead with openmp. Any current installation
>>>> issues are only temporarily for osx.
>>>> Openmp gives us a lot of capability to play with shared memory and it
>>>> is a standard that will be around
>>>> for very long time. Also, the great integration in cython makes the
>>>> algorithms really easy to read.
>>>> So, especially for this project my recommendation is to use openmp
>>>> rather than multiprocessing. All the way! :)
>>>>
>>>> I am CC'ing Stephan who wrote the instructions for osx. I am sure he
>>>> can help you with this. I would also suggest
>>>> to check if xcode provides any new guis for enabling openmp. I remember
>>>> there was something for that.
>>>>
>>>> Laterz!
>>>> Eleftherios
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Dec 14, 2016 at 6:29 AM Emanuele Olivetti <olivetti at fbk.eu>
>>>> wrote:
>>>>
>>>>> Hi Eleftherios,
>>>>>
>>>>> Thank you for pointing me to the MDF example. From what I see the
>>>>> Cython syntax is not complex, which is good.
>>>>>
>>>>> My only concern is the availability of OpenMP in the systems where
>>>>> DiPy is used. On a reasonably recent GNU/Linux machine it seems
>>>>> straightforward to have libgomp and the proper version of gcc. On other
>>>>> systems - say OSX - the situation is less clear to me. According to what I
>>>>> read here
>>>>>   http://nipy.org/dipy/installation.html#openmp-with-osx
>>>>> the OSX installation steps are not meant for standard end users. Are
>>>>> those instructions updated?
>>>>> As a test of that, we've just tried to skip the steps described above
>>>>> and instead to install gcc with conda on OSX ("conda install gcc"). In the
>>>>> process, conda installed the recent gcc-4.8 with libgomp, which seems good
>>>>> news. Unfortunately, when we tried to compile a simple example of Cython
>>>>> code using parallelization (see below), the process failed (fatal error:
>>>>> limits.h : no such file or directory)...
>>>>>
>>>>> For the reasons above, I am wondering whether the very simple solution
>>>>> of using the "multiprocessing" module, available from the standard Python
>>>>> library, may be an acceptable first step towards the more efficient
>>>>> multithreading of Cython/libgomp. With "multiprocessing", there is no extra
>>>>> dependency on libgomp, or recent gcc or else. Moreover, multiprocessing
>>>>> does not require to have Cython code, because it works on plain Python too.
>>>>>
>>>>> Best,
>>>>>
>>>>> Emanuele
>>>>>
>>>>> ---- test.pyx ----
>>>>> from cython import parallel
>>>>> from libc.stdio cimport printf
>>>>>
>>>>> def test_func():
>>>>>     cdef int thread_id = -1
>>>>>     with nogil, parallel.parallel(num_threads=10):
>>>>>         thread_id = parallel.threadid()
>>>>>         printf("Thread ID: %d\n", thread_id)
>>>>> -----
>>>>>
>>>>> ----- setup.py -----
>>>>> from distutils.core import setup, Extension
>>>>> from Cython.Build import cythonize
>>>>>
>>>>> extensions = [Extension(
>>>>>                 "test",
>>>>>                 sources=["test.pyx"],
>>>>>                 extra_compile_args=["-fopenmp"],
>>>>>                 extra_link_args=["-fopenmp"]
>>>>>             )]
>>>>>
>>>>> setup(
>>>>>     ext_modules = cythonize(extensions)
>>>>> )
>>>>> ----
>>>>> python setup.py build_ext --inplace
>>>>>
>>>>> On Tue, Dec 13, 2016 at 11:17 PM, Eleftherios Garyfallidis <
>>>>> elef at indiana.edu> wrote:
>>>>>
>>>>> Hi Emanuele,
>>>>>
>>>>> Here is an example of how we calculated the distance matrix in
>>>>> parallel (for the MDF) using OpenMP
>>>>> https://github.com/nipy/dipy/blob/master/dipy/align/bundlemin.pyx
>>>>>
>>>>> You can just add another function that does the same using mam. It
>>>>> should be really easy to implement as we have
>>>>> already done it for the MDF for speeding up SLR.
>>>>>
>>>>> Then we need to update the bundle_distances* functions to use the
>>>>> parallel versions.
>>>>>
>>>>> I'll be happy to help you with this. Let's try to schedule some time
>>>>> to look at this together.
>>>>>
>>>>> Best regards,
>>>>> Eleftherios
>>>>>
>>>>>
>>>>> On Mon, Dec 12, 2016 at 11:16 AM Emanuele Olivetti <olivetti at fbk.eu>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I usually compute the distance matrix between two lists of streamlines
>>>>> using bundle_distances_mam() or bundle_distances_mdf(). When the lists are
>>>>> large, it is convenient and easy to exploit the multiple cores of the CPU
>>>>> because such computation is intrinsically (embarassingly) parallel. At the
>>>>> moment I'm doing it through the multiprocessing or the joblib modules,
>>>>> because I cannot find a way directly from DiPy, at least according to what
>>>>> I see in dipy/tracking/distances.pyx . But consider that I am not
>>>>> proficient in cython.parallel.
>>>>>
>>>>> Is there a preferable way to perform such parallel computation? I plan
>>>>> to prepare a pull request in future and I'd like to be on the right track.
>>>>>
>>>>> Best,
>>>>>
>>>>> Emanuele
>>>>>
>>>>> _______________________________________________
>>>>> Neuroimaging mailing list
>>>>> Neuroimaging at python.org
>>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Neuroimaging mailing list
>>>>> Neuroimaging at python.org
>>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Neuroimaging mailing list
>>>>> Neuroimaging at python.org
>>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>>
>>>>
>>>> _______________________________________________
>>>> Neuroimaging mailing list
>>>> Neuroimaging at python.org
>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>>
>
> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20161215/0131126c/attachment.html>


More information about the Neuroimaging mailing list