<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Dear all,<br>
<div class="moz-forward-container"> <br>
I need some advice regarding use of the multiprocessing module.
Following is the scenario:<br>
<br>
<ul>
<li>I am running gradient descent to estimate parameters of a
pairwise grid CRF (or a grid based graphical model). There are
106 data points. Each data point can be analyzed in parallel.</li>
<li>To calculate gradient for each data point, I need to perform
approximate inference since this is a loopy model. I am using
Gibbs sampling. <br>
</li>
<li>My grid is 9x9 so there are 81 variables that I am sampling
in one sweep of Gibbs sampling. I perform 1000 iterations of
Gibbs sampling.</li>
<li>My laptop has quad-core Intel i5 processor, so I thought
using multiprocessing module I can parallelize my code
(basically calculate gradient in parallel on multiple cores
simultaneously).</li>
<li>I did not use the multi-threading library because of GIL
issues, GIL does not allow multiple threads to run at a time.</li>
<li>As a result I end up creating a process for each data point
(instead of a thread that I would ideally like to do, so as to
avoid process creation overhead).</li>
<li>I am using basic NumPy array functionalities.</li>
</ul>
<p>Previously I was running this code in MATLAB. It runs quite
faster, one iteration of gradient descent takes around 14 sec in
MATLAB using parfor loop (parallel loop - data points is
analyzed within parallel loop). However same program takes
almost 215 sec in Python.<br>
</p>
<p>I am quite amazed at the slowness of multiprocessing module. Is
this because of process creation overhead for each data point?<br>
</p>
<p>Please keep my email in the replies as I am not a member of
this mailing list.<br>
</p>
<p>Thanks,<br>
Abhinav<br>
</p>
<br>
</div>
<br>
</body>
</html>