There are times when you deal with completely independent input/output 'pipes' - where parallelizing would really help speed things up. Can't there be a way to capture that idiom and multi thread it in the language itself? Example: loop: read an XML produce a JSON like Regards, MB
This really isn't the place to ask this kind of question. If you want to know how to do something with python, try python-users , stack overflow, etc. If you have an idea about a new feature you think python could have, then the python-ideas list is the place for that. But if you want anyone to take it seriously, it should be a better formed idea before you post there. But: On Tue, Sep 12, 2017 at 4:43 PM, Matthieu Bec <mdcb808@gmail.com> wrote:
There are times when you deal with completely independent input/output 'pipes' - where parallelizing would really help speed things up.
Can't there be a way to capture that idiom and multi thread it in the language itself?
Example:
loop:
read an XML
produce a JSON like
Regular old threading works fine for this: import time import random import threading def process(infile, outfile): "fake function to simulate a process that takes a random amount of time" time.sleep(random.random()) print("processing: {} to make {}".format(infile, outfile)) for i in range(10): threading.Thread(target=process, args=("file%i.xml" % i, "file%i.xml" % i)).start() It gets complicated if you need to pass information back and forth, or worry about race conditions, or manage a queue, or .... But just running a nice self-contained thread safe function in another thread is pretty straightforward. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Thank you, I'll take your advice. Regarding your example, I think it gives the illusion to work because sleep() is GIL aware under the hood. I don't think it works for process() that mainly runs bytecode, because of the GIL. Sorry if I wrongly thought that was a language level discussion. Regards, Matthieu On 9/13/17 10:49 AM, Chris Barker wrote:
This really isn't the place to ask this kind of question.
If you want to know how to do something with python, try python-users , stack overflow, etc.
If you have an idea about a new feature you think python could have, then the python-ideas list is the place for that. But if you want anyone to take it seriously, it should be a better formed idea before you post there.
But:
On Tue, Sep 12, 2017 at 4:43 PM, Matthieu Bec <mdcb808@gmail.com <mailto:mdcb808@gmail.com>> wrote:
There are times when you deal with completely independent input/output 'pipes' - where parallelizing would really help speed things up.
Can't there be a way to capture that idiom and multi thread it in the language itself?
Example:
loop:
read an XML
produce a JSON like
Regular old threading works fine for this:
import time import random import threading
def process(infile, outfile): "fake function to simulate a process that takes a random amount of time" time.sleep(random.random()) print("processing: {} to make {}".format(infile, outfile))
for i in range(10): threading.Thread(target=process, args=("file%i.xml" % i, "file%i.xml" % i)).start()
It gets complicated if you need to pass information back and forth, or worry about race conditions, or manage a queue, or ....
But just running a nice self-contained thread safe function in another thread is pretty straightforward.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov <mailto:Chris.Barker@noaa.gov>
On Wed, Sep 13, 2017 at 12:11 PM, Matthieu Bec <mdcb808@gmail.com> wrote:
Regarding your example, I think it gives the illusion to work because sleep() is GIL aware under the hood.
It'll work for anything -- it just may not buy you any performance. I don't know off the top of my head if file I/O captures the GIL -- for your example of file parsing.
I don't think it works for process() that mainly runs bytecode, because of the GIL.
If you are trying to get around the GIL that that is a totally different question. But the easy way is to use multiprocessing instead: import time import random import multiprocessing def process(infile, outfile): "fake function to simulate a process that takes a random amount of time" time.sleep(random.random()) print("processing: {} to make {}".format(infile, outfile)) for i in range(10): multiprocessing.Process(target=process, args=("file%i.xml" % i, "file%i.xml" % i)).start() More overhead creating the processes, but no more GIL issues. Sorry if I wrongly thought that was a language level discussion.
This list is for discussion of the development of the cPython interpreter. So this kind of discussion doesn't belong here unless/until it gets to the point of actually implementing something. If you have an idea as to how to improve Python, then python-ideas is the place for that discussion. But "there should be a way to run threads without the GIL" isn't a well-enough formed idea to get far there.... If you want to discuss further, let's take this offline.
Can't there be a way to capture that idiom and multi thread it in the language itself?
Example:
loop:
read an XML
produce a JSON like
note about this -- code like this would be using all sorts of shared
modules. The code in those modules is going to be touched by all the threads. There is no way the python interpreter can know which python objects are used by what how --- the GIL is there for good (and complex) reasons, not an easy task to avoid it. It's all using the same interpreter. Also -- it's not easy to know what code may work OK with the GIL. intensive computation is bad. But Python is a poor choice for that anyway. And code that does a lot in C -- numpy, text processing, etc. may not hold the GIL. And I/O So for your example of parsing XML and writing JSON -- it may well do a lot of work without holding the GIL. No way to know but to profile it. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (2)
-
Chris Barker
-
Matthieu Bec