[Neuroimaging] parallel computation of bundle_distances_mam/mdf ?
Ariel Rokem
arokem at gmail.com
Thu Dec 15 19:19:44 EST 2016
On Thu, Dec 15, 2016 at 8:02 AM, Emanuele Olivetti <olivetti at fbk.eu> wrote:
> Hi Ariel,
>
> I put the code in a previous message and did not attach it my last
> message, on purpose. Now I am attaching it here, for your convenience.
> Notice that is the same test.pyx/setup.py you succeeded to compile on your
> recent OSX, wihtout OpenMP support. You can just import that module and
> invoke test.test_func() to see the output I mentioned:
> python -c "import test; test.test_func()"
>
> Anyway, what I now believe is even more interesting, is to test something
> simpler:
> https://gist.github.com/romeroyonatan/6a380e189cee4b1290700cbc5259c837
> That is a tiny piece of C code that does basically the same of test.pyx
> but without the extra complexity of Cython. I tried it on GNU/Linux
> (gcc+libgomp) and it works as expcted. Then I tried it on Paolo's laptop
> (OSX 10.11.5) and it fails both with CLang and with Anaconda's
> gcc4.8+libgomp (= "conda install gcc"). So this example narrows down the
> problem to OpenMP only on OSX - no Python, Cython or else. Could you please
> try compile and run it? As indicated there, you just need to:
> gcc -fopenmp hello-openmp.c
> ./a.out
>
>
Hmm. I can't seem to be able to compile either of these. I think that dipy
does some additional magic to check whether you can use openmp or not,
around here:
https://github.com/nipy/dipy/blob/master/setup.py#L118-L122
> I am very surprised that such a minimal example fails to compile on OSX
> with Anaconda's gcc4.8+libgomp. If someone has an explanation of such
> issue, I'd really like to know more about it.
>
> Best,
>
> Emanuele
>
>
>
> On Thu, Dec 15, 2016 at 4:09 PM, Ariel Rokem <arokem at gmail.com> wrote:
>
>> Emanule,
>>
>> On Thu, Dec 15, 2016 at 12:41 AM, Emanuele Olivetti <olivetti at fbk.eu>
>> wrote:
>>
>>> Thank you to all of you for the many details that help to clear up the
>>> picture.
>>>
>>> It is my understanding that even the last version of OSX, i.e. v10.12.*,
>>> does not support libomp within the Apple's Xcode development framework. At
>>> the same time, future version should support it, at some point. Honestly,
>>> this does not seem an ideal scenario, because not many OSX users chase the
>>> latest versions of the opeating system, while everyone has multiple cores
>>> since many years. For the same reason, I doubt that contacting the Xcode
>>> side may bring us closer to the solution.
>>> To Ariel: if you want to test that OpemMP works, once you build test.so,
>>> just import it in the Python console and execute test.test_func(). You
>>> should get this:
>>>
>>
>> Was there supposed to be a file attached?
>>
>> Thanks!
>>
>> Ariel
>>
>>
>>> ---
>>> >>> import test
>>> >>> test.test_func()
>>> Thread ID: 2
>>> Thread ID: 8
>>> Thread ID: 7
>>> Thread ID: 3
>>> Thread ID: 4
>>> Thread ID: 5
>>> Thread ID: 6
>>> Thread ID: 0
>>> Thread ID: 1
>>> Thread ID: 9
>>> ---
>>> If you do not enable the OpenMP support (just comment out the two
>>> related lines in setup.py and re-build), the output will differ and only
>>> one thread will show up:
>>> ---
>>> >>> import test
>>> >>> test.test_func()
>>> Thread ID: 0
>>> ---
>>>
>>> Anyway, what surprises me most is the failure of Anaconda's gcc+libgomp
>>> on OSX. In principle that toolchain is exactly the same as on GNU/Linux, so
>>> I see no reason for it fail. I'll try to dig more into this one.
>>>
>>> To Eleftherios: I agree with you that OpenMP is the best technical
>>> solution, because the use of threads - which shares memory - is definitely
>>> more preferable than multiprocessing - which replicates data in each
>>> process. Anyway, the limitations I see are still there:
>>> 1) OpenMP support seems problematic on non-GNU/Linux systems. By the
>>> way, what is the situation on Windows? Here we don't have such system so we
>>> couldn't try.
>>> 2) OpemMP works only with Cython code. This means that no Python part of
>>> DiPy can use parallel execution. This seems a bit too excessive to me,
>>> because I expect that not all parts of DiPy will be re-written in Cython,
>>> in the foreseeable future.
>>>
>>> So, again, I advocate for some use of the standard multiprocessing
>>> module :) as an temporary step when OpenMP cannot be used.
>>>
>>> Best,
>>>
>>> Emanuele
>>>
>>>
>>> On Wed, Dec 14, 2016 at 10:04 PM, Stephan Meesters <
>>> stephan.meesters at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> The instructions at http://nipy.org/dipy/instal
>>>> lation.html#openmp-with-osx are outdated since the clang-omp formula
>>>> does not exist anymore.
>>>>
>>>> With the release of Clang 3.8.0 (08 Mar 2016), OpenMP 3.1 support is
>>>> enabled in Clang by default. You will need the -fopenmp=libomp flag while
>>>> building.
>>>>
>>>> I started a while back on a DIPY homebrew formula to allow installation
>>>> via "brew install dipy", see https://github.com/Homebrew/ho
>>>> mebrew-python/pull/310. However it was based around the deprecated
>>>> clang-omp and I didn't get around to fix it. Might look into it soon if I
>>>> find some spare time to tweak the formula.
>>>>
>>>> Regards,
>>>> Stephan
>>>>
>>>> 2016-12-14 17:14 GMT+01:00 Samuel St-Jean <stjeansam at gmail.com>:
>>>>
>>>>> That also depends on which version of clang ships by default with osx,
>>>>> in which case you have to play around with it to get a new one. I think it
>>>>> starts with clang 3.7 to have openmp support (I only ever tried Mac osx
>>>>> 10.9, so anyone more experienced can chime in), but everything older has to
>>>>> go through the hombebrew gcc install and company. Might be worhtwhile to
>>>>> check if openmp support is out of the box now, and starting on which mac
>>>>> osx versions, since older ones could be problematic for first time
>>>>> installs.
>>>>>
>>>>> I also have to admit I have no idea how old is old in the mac world,
>>>>> so maybe 10.9 is already phased out by now, but it was a hard and time
>>>>> consuming building around stuff with homebrew experience (and again, first
>>>>> time I used a mac, so, I guess the average user would also have some
>>>>> issues).
>>>>>
>>>>> Samuel
>>>>>
>>>>> 2016-12-14 16:51 GMT+01:00 Eleftherios Garyfallidis <elef at indiana.edu>
>>>>> :
>>>>>
>>>>>>
>>>>>> Hi Emanuele,
>>>>>>
>>>>>> My understanding is that openmp was only temporarily not available
>>>>>> when clang replaced gcc in osx.
>>>>>>
>>>>>> So, I would suggest to go ahead with openmp. Any current installation
>>>>>> issues are only temporarily for osx.
>>>>>> Openmp gives us a lot of capability to play with shared memory and it
>>>>>> is a standard that will be around
>>>>>> for very long time. Also, the great integration in cython makes the
>>>>>> algorithms really easy to read.
>>>>>> So, especially for this project my recommendation is to use openmp
>>>>>> rather than multiprocessing. All the way! :)
>>>>>>
>>>>>> I am CC'ing Stephan who wrote the instructions for osx. I am sure he
>>>>>> can help you with this. I would also suggest
>>>>>> to check if xcode provides any new guis for enabling openmp. I
>>>>>> remember there was something for that.
>>>>>>
>>>>>> Laterz!
>>>>>> Eleftherios
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 14, 2016 at 6:29 AM Emanuele Olivetti <olivetti at fbk.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Eleftherios,
>>>>>>>
>>>>>>> Thank you for pointing me to the MDF example. From what I see the
>>>>>>> Cython syntax is not complex, which is good.
>>>>>>>
>>>>>>> My only concern is the availability of OpenMP in the systems where
>>>>>>> DiPy is used. On a reasonably recent GNU/Linux machine it seems
>>>>>>> straightforward to have libgomp and the proper version of gcc. On other
>>>>>>> systems - say OSX - the situation is less clear to me. According to what I
>>>>>>> read here
>>>>>>> http://nipy.org/dipy/installation.html#openmp-with-osx
>>>>>>> the OSX installation steps are not meant for standard end users. Are
>>>>>>> those instructions updated?
>>>>>>> As a test of that, we've just tried to skip the steps described
>>>>>>> above and instead to install gcc with conda on OSX ("conda install gcc").
>>>>>>> In the process, conda installed the recent gcc-4.8 with libgomp, which
>>>>>>> seems good news. Unfortunately, when we tried to compile a simple example
>>>>>>> of Cython code using parallelization (see below), the process failed (fatal
>>>>>>> error: limits.h : no such file or directory)...
>>>>>>>
>>>>>>> For the reasons above, I am wondering whether the very simple
>>>>>>> solution of using the "multiprocessing" module, available from the standard
>>>>>>> Python library, may be an acceptable first step towards the more efficient
>>>>>>> multithreading of Cython/libgomp. With "multiprocessing", there is no extra
>>>>>>> dependency on libgomp, or recent gcc or else. Moreover, multiprocessing
>>>>>>> does not require to have Cython code, because it works on plain Python too.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Emanuele
>>>>>>>
>>>>>>> ---- test.pyx ----
>>>>>>> from cython import parallel
>>>>>>> from libc.stdio cimport printf
>>>>>>>
>>>>>>> def test_func():
>>>>>>> cdef int thread_id = -1
>>>>>>> with nogil, parallel.parallel(num_threads=10):
>>>>>>> thread_id = parallel.threadid()
>>>>>>> printf("Thread ID: %d\n", thread_id)
>>>>>>> -----
>>>>>>>
>>>>>>> ----- setup.py -----
>>>>>>> from distutils.core import setup, Extension
>>>>>>> from Cython.Build import cythonize
>>>>>>>
>>>>>>> extensions = [Extension(
>>>>>>> "test",
>>>>>>> sources=["test.pyx"],
>>>>>>> extra_compile_args=["-fopenmp"],
>>>>>>> extra_link_args=["-fopenmp"]
>>>>>>> )]
>>>>>>>
>>>>>>> setup(
>>>>>>> ext_modules = cythonize(extensions)
>>>>>>> )
>>>>>>> ----
>>>>>>> python setup.py build_ext --inplace
>>>>>>>
>>>>>>> On Tue, Dec 13, 2016 at 11:17 PM, Eleftherios Garyfallidis <
>>>>>>> elef at indiana.edu> wrote:
>>>>>>>
>>>>>>> Hi Emanuele,
>>>>>>>
>>>>>>> Here is an example of how we calculated the distance matrix in
>>>>>>> parallel (for the MDF) using OpenMP
>>>>>>> https://github.com/nipy/dipy/blob/master/dipy/align/bundlemin.pyx
>>>>>>>
>>>>>>> You can just add another function that does the same using mam. It
>>>>>>> should be really easy to implement as we have
>>>>>>> already done it for the MDF for speeding up SLR.
>>>>>>>
>>>>>>> Then we need to update the bundle_distances* functions to use the
>>>>>>> parallel versions.
>>>>>>>
>>>>>>> I'll be happy to help you with this. Let's try to schedule some time
>>>>>>> to look at this together.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Eleftherios
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 12, 2016 at 11:16 AM Emanuele Olivetti <olivetti at fbk.eu>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I usually compute the distance matrix between two lists of
>>>>>>> streamlines using bundle_distances_mam() or bundle_distances_mdf(). When
>>>>>>> the lists are large, it is convenient and easy to exploit the multiple
>>>>>>> cores of the CPU because such computation is intrinsically (embarassingly)
>>>>>>> parallel. At the moment I'm doing it through the multiprocessing or the
>>>>>>> joblib modules, because I cannot find a way directly from DiPy, at least
>>>>>>> according to what I see in dipy/tracking/distances.pyx . But consider that
>>>>>>> I am not proficient in cython.parallel.
>>>>>>>
>>>>>>> Is there a preferable way to perform such parallel computation? I
>>>>>>> plan to prepare a pull request in future and I'd like to be on the right
>>>>>>> track.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Emanuele
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Neuroimaging mailing list
>>>>>>> Neuroimaging at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Neuroimaging mailing list
>>>>>>> Neuroimaging at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Neuroimaging mailing list
>>>>>>> Neuroimaging at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Neuroimaging mailing list
>>>>>> Neuroimaging at python.org
>>>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Neuroimaging mailing list
>>>> Neuroimaging at python.org
>>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Neuroimaging mailing list
>>> Neuroimaging at python.org
>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>
>>>
>>
>> _______________________________________________
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>>
>
> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20161215/2f408768/attachment-0001.html>
More information about the Neuroimaging
mailing list