More CPUs doen't equal more speed
Rob Gaddi
rgaddi at highlandtechnology.invalid
Fri May 24 12:28:18 EDT 2019
On 5/23/19 6:32 PM, Cameron Simpson wrote:
> On 23May2019 17:04, bvdp <bob at mellowood.ca> wrote:
>> Anyway, yes the problem is that I was naively using command.getoutput()
>> which blocks until the command is finished. So, of course, only one
>> process
>> was being run at one time! Bad me!
>>
>> I guess I should be looking at subprocess.Popen(). Now, a more relevant
>> question ... if I do it this way I then need to poll though a list of
>> saved
>> process IDs to see which have finished? Right? My initial thought is to
>> batch them up in small groups (say CPU_COUNT-1) and wait for that
>> batch to
>> finish, etc. Would it be foolish to send send a large number (1200 in
>> this
>> case since this is the number of files) and let the OS worry about
>> scheduling and have my program poll 1200 IDs?
>>
>> Someone mentioned the GIL. If I launch separate processes then I don't
>> encounter this issue? Right?
>
> Yes, but it becomes more painful to manage. If you're issues distinct
> separate commands anyway, dispatch many or all and then wait for them as
> a distinct step. If the commands start thrashing the rest of the OS
> resources (such as the disc) then you may want to do some capacity
> limitation, such as a counter or semaphore to limit how many go at once.
>
> Now, waiting for a subcommand can be done in a few ways.
>
> If you're then parent of all the processes you can keep a set() of the
> issued process ids and then call os.wait() repeatedly, which returns the
> pid of a completed child process. Check it against your set. If you need
> to act on the specific process, use a dict to map pids to some record of
> the subprocess.
>
> Alternatively, you can spawn a Python Thread for each subcommand, have
> the Thread dispatch the subcommand _and_ wait for it (i.e. keep your
> command.getoutput() method, but in a Thread). Main programme waits for
> the Threads by join()ing them.
>
I'll just note, because no one else has brought it up yet, that rather
than manually creating threads and/or process pools for all these
things, this is exactly what the standard concurrent.futures module is
for. It's a fairly brilliant wrapper around all this stuff, and I feel
like it often doesn't get enough love.
--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
More information about the Python-list
mailing list