[Python-Dev] parallelizing

Chris Barker chris.barker at noaa.gov
Wed Sep 13 20:07:07 EDT 2017


On Wed, Sep 13, 2017 at 12:11 PM, Matthieu Bec <mdcb808 at gmail.com> wrote:

> Regarding your example, I think it gives the illusion to work because
> sleep() is GIL aware under the hood.
>
>
It'll work for anything -- it just may not buy you any performance.

I don't know off the top of my head if file I/O captures the GIL -- for
your example of file parsing.

> I don't think it works for process() that mainly runs bytecode, because of
> the GIL.
>
If you are trying to get around the GIL that that is a totally different
question.

But the easy way is to use multiprocessing instead:

import time
import random
import multiprocessing


def process(infile, outfile):
    "fake function to simulate a process that takes a random amount of time"
    time.sleep(random.random())
    print("processing: {} to make {}".format(infile, outfile))

for i in range(10):
    multiprocessing.Process(target=process, args=("file%i.xml" % i,
"file%i.xml" % i)).start()

More overhead creating the processes, but no more GIL issues.

Sorry if I wrongly thought that was a language level discussion.
>
>
This list is for discussion of the development of the cPython interpreter.
So this kind of discussion doesn't belong here unless/until it gets to the
point of actually implementing something.

If you have an idea as to how to improve Python, then python-ideas is the
place for that discussion.

But "there should be a way to run threads without the GIL" isn't a
well-enough formed idea to get far there....

If you want to discuss further, let's take this offline.

> Can't there be a way to capture that idiom and multi thread it in the
language itself?

>
>> Example:
>>
>> loop:
>>
>>     read an XML
>>
>>     produce a JSON like
>>
> note about this -- code like this would be using all sorts of shared
modules. The code in those modules is going to be touched by all the
threads. There is no way the python interpreter can know which python
objects are used by what how --- the GIL is there for good (and complex)
reasons, not an easy task to avoid it. It's all using the same interpreter.

Also -- it's not easy to know what code may work OK with the GIL. intensive
computation is bad. But Python is a poor choice for that anyway.

And code that does a lot in C -- numpy, text processing, etc. may not hold
the GIL. And I/O

So for your example of parsing XML and writing JSON -- it may well do a lot
of work without holding the GIL.

No way to know but to profile it.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170913/9ae7059d/attachment.html>


More information about the Python-Dev mailing list