
Hi, I've made a custom concurrent.futures.Executor mixing the ProcessPoolExecutor and ThreadPoolExecutor. I've published it here: https://github.com/nilp0inter/threadedprocess This executor is very similar to a ProcessPoolExecutor, but each process in the pool have it's own ThreadPoolExecutor inside. The motivation for this executor is mitigate the problem we have in a project were we have a very large number of long running IO bounded tasks, that have to run concurrently. Those long running tasks have sparse CPU bounded operations. To resolve this problem I considered multiple solutions: 1. Use asyncio to run the IO part as tasks and use a ProcessPoolExecutor to run the CPU bounded operations with "run_in_executor". Unfortunately the CPU operations depends on a large memory context, and using a ProcessPoolExecutor this way force the parent process to picklelize all the context to send it to the task, and because the context is so large, this operation is itself very CPU demanding. So it doesn't work. 2. Executing the IO/CPU bounded operations in different processes with multiprocessing.Process. This actually works, but the number of idle processes in the system is too large, resulting in a bad memory footprint. 3. Executing the IO/CPU bounded operations in threads. This doesn't work because the sum of all CPU operations saturate the core where the Python process is running and the other cores are wasted doing nothing. So I coded the ThreadedProcessPoolExecutor that helped me maintaining the number of processes under control (I just have one process per CPU core) allowing me to have a very high concurrency (hundreds of threads per process). I have a couple of questions: The first one is about the license. Given that I copied the majority of the code from the concurrent.futures library, I understand that I have to publish the code under the PSF LICENSE. Is this correct? My second question is about the package namespace. Given that this is an concurrent.futures.Executor subclass I understand that more intuitive place to locate it is under concurrent.futures. Is this a suitable use case for namespace packages? Is this a good idea? Best regards, Roberto

Roberto, That looks like an interesting class. I presume you're intending to publish this as a pip package on PyPI.python.org? I'm no lawyer, but I believe you can license your code under a new license (I recommend BSD) as long as you keep a copy and a mention of the PSF license in your distribution as well. (Though perhaps you could structure your code differently and inherit from the standard library modules rather than copying them?) In terms of the package namespace, do not put it in the same namespace as standard library code! It probably won't work and will cause world-wide pain and suffering for the users of your code. Invent your project name and use that as a top-level namespace, like everyone else. :-) Good luck with your project, --Guido On Wed, Mar 21, 2018 at 8:03 AM, Roberto Martínez < robertomartinezp@gmail.com> wrote:
Hi,
I've made a custom concurrent.futures.Executor mixing the ProcessPoolExecutor and ThreadPoolExecutor.
I've published it here:
https://github.com/nilp0inter/threadedprocess
This executor is very similar to a ProcessPoolExecutor, but each process in the pool have it's own ThreadPoolExecutor inside.
The motivation for this executor is mitigate the problem we have in a project were we have a very large number of long running IO bounded tasks, that have to run concurrently. Those long running tasks have sparse CPU bounded operations.
To resolve this problem I considered multiple solutions:
1. Use asyncio to run the IO part as tasks and use a ProcessPoolExecutor to run the CPU bounded operations with "run_in_executor". Unfortunately the CPU operations depends on a large memory context, and using a ProcessPoolExecutor this way force the parent process to picklelize all the context to send it to the task, and because the context is so large, this operation is itself very CPU demanding. So it doesn't work. 2. Executing the IO/CPU bounded operations in different processes with multiprocessing.Process. This actually works, but the number of idle processes in the system is too large, resulting in a bad memory footprint. 3. Executing the IO/CPU bounded operations in threads. This doesn't work because the sum of all CPU operations saturate the core where the Python process is running and the other cores are wasted doing nothing.
So I coded the ThreadedProcessPoolExecutor that helped me maintaining the number of processes under control (I just have one process per CPU core) allowing me to have a very high concurrency (hundreds of threads per process).
I have a couple of questions:
The first one is about the license. Given that I copied the majority of the code from the concurrent.futures library, I understand that I have to publish the code under the PSF LICENSE. Is this correct?
My second question is about the package namespace. Given that this is an concurrent.futures.Executor subclass I understand that more intuitive place to locate it is under concurrent.futures. Is this a suitable use case for namespace packages? Is this a good idea?
Best regards, Roberto
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org
-- --Guido van Rossum (python.org/~guido)

El mié., 21 mar. 2018 a las 16:23, Guido van Rossum (<guido@python.org>) escribió:
Roberto,
That looks like an interesting class. I presume you're intending to publish this as a pip package on PyPI.python.org?
Precisely. I'm no lawyer, but I believe you can license your code under a new license
(I recommend BSD) as long as you keep a copy and a mention of the PSF license in your distribution as well. (Though perhaps you could structure your code differently and inherit from the standard library modules rather than copying them?)
I am using inheritance as much as I can. But due to some functions being at the module level, instead of being Executor methods (for the sake of being pickelizable, I suppose); I am being forced to copy some of them just to modify a couple of lines.
In terms of the package namespace, do not put it in the same namespace as standard library code! It probably won't work and will cause world-wide pain and suffering for the users of your code. Invent your project name and use that as a top-level namespace, like everyone else. :-)
Ok, I don't want to cause world-wide pain (yet). Thank you! Best regards, Roberto
participants (2)
-
Guido van Rossum
-
Roberto Martínez