[scikit-image] Image analysis pipeline improvement suggestions

simone codeluppi simone.codeluppi at gmail.com
Wed Dec 28 13:07:39 EST 2016


Hi all!

I would like to pick your brain for some suggestion on how to modify my
image analysis pipeline.

I am analyzing terabytes of image stacks generated using a microscope. The
current code I generated rely heavily on scikit-image, numpy and scipy. In
order to speed up the analysis the code runs on a HPC computer (
https://www.nsc.liu.se/systems/triolith/) with MPI (mpi4py) for
parallelization and hdf5 (h5py) for file storage. The development cycle of
the code has been pretty painful mainly due to my non familiarity with mpi
and problems in compiling parallel hdf5 (with many open/closing bugs).
However, the big drawback is that each core has only 2Gb of RAM (no shared
ram across nodes) and in order to run some of the processing steps i ended
up reserving one node (16 cores) but running only 3 cores in order to have
enough ram (image chunking won’t work in this case). As you can imagine
this is extremely inefficient and i end up getting low priority in the
queue system.


Our lab currently bought a new 4 nodes server with shared RAM running
hadoop. My goal is to move the parallelization of the processing to dask. I
tested it before in another system and works great. The drawback is that,
if I understood correctly, parallel hdf5 works only with MPI
(driver=’mpio’). Hdf5 gave me quite a bit of headache but works well in
keeping a good structure of the data and i can save everything as numpy
arrays….very handy.


If I will move to hadoop/dask what do you think will be a good solution for
data storage? Do you have any additional suggestion that can improve the
layout of the pipeline? Any help will be greatly appreciated.


Simone
-- 
*Bad as he is, the Devil may be abus'd,*
*Be falsy charg'd, and causelesly accus'd,*
*When men, unwilling to be blam'd alone,*
*Shift off these Crimes on Him which are their*
*Own*

                                                      *Daniel Defoe*

simone.codeluppi at gmail.com

simone at codeluppi.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-image/attachments/20161228/4d2d6b47/attachment.html>


More information about the scikit-image mailing list