Multiprocessing performance question
DL Neil
PythonList at DancesWithMice.info
Wed Feb 20 20:30:20 EST 2019
George
On 21/02/19 1:15 PM, george trojan wrote:
> def create_box(x_y):
> return geometry.box(x_y[0] - 1, x_y[1], x_y[0], x_y[1] - 1)
>
> x_range = range(1, 1001)
> y_range = range(1, 801)
> x_y_range = list(itertools.product(x_range, y_range))
>
> grid = list(map(create_box, x_y_range))
>
> Which creates and populates an 800x1000 “grid” (represented as a flat list
> at this point) of “boxes”, where a box is a shapely.geometry.box(). This
> takes about 10 seconds to run.
>
> Looking at this, I am thinking it would lend itself well to
> parallelization. Since the box at each “coordinate" is independent of all
> others, it seems I should be able to simply split the list up into chunks
> and process each chunk in parallel on a separate core. To that end, I
> created a multiprocessing pool:
I recall a similar discussion when folk were being encouraged to move
away from monolithic and straight-line processing to modular functions -
it is more (CPU-time) efficient to run in a straight line; than it is to
repeatedly call, set-up, execute, and return-from a function or
sub-routine! ie there is an over-head to many/all constructs!
Isn't the 'problem' that it is a 'toy example'? That the amount of
computing within each parallel process is small in relation to the
inherent 'overhead'.
Thus, if the code performed a reasonable analytical task within each box
after it had been defined (increased CPU load), would you then notice
the expected difference between the single- and multi-process
implementations?
From AKL to AK
--
Regards =dn
More information about the Python-list
mailing list