Usage of the multiprocessing API and object lifetime

Hi,
tzickel reported a reference cycle bug in multiprocessing which keeps threads and processes alive:
https://bugs.python.org/issue34172
He wrote a fix which has been merged in 3.6, 3.7 and master branches. But Pablo Galindo noticed that the fix breaks the following code (he added "I found the weird code in the example in several projects."):
import multiprocessing
def the_test(): print("Begin") for x in multiprocessing.Pool().imap(int, ["4", "3"]): print(x) print("End")
the_test()
Pablo proposed to add a strong reference to the Pool from multiprocessing iterators: https://bugs.python.org/issue35378
I blocked his pull request because I see this change as a risk of new reference cycles. Since we are close to 3.6 and 3.7 releases, I decided to revert the multiprocessing fix instead.
Pablo's issue35378 evolved to add a weak reference in iterators to try to detect when the Pool is destroyed: raise an exception from the iterator, if possible.
Then a discussion started on how the multiprocessing API is supposed to be used and about the lifetime of multiprocessing objects.
I would prefer to make the multiprocessing API stricter: Python shouldn't try to guess how long an object is going to live. The API user has to *explicitly* release resources.
tzickel noted that the documentations says:
"When the pool object is garbage collected terminate() will be called immediately."
And that multiprocessing rely on the garbage collector to release resources, especially using multiprocessing.util.Finalize tool:
class Finalize(object): ''' Class which supports object finalization using weakrefs ''' def __init__(self, obj, callback, ...): ... if obj is not None: self._weakref = weakref.ref(obj, self) else: assert exitpriority is not None ... _finalizer_registry[self._key] = self
I propose to start to emit ResourceWarning in Python 3.8 when objects are not released explicitly. I wrote a first change to emit ResourceWarning in the Pool object:
https://bugs.python.org/issue35424 https://github.com/python/cpython/pull/10974
By the way, I'm surprised that "with pool:" doesn't release all resources. An additional "pool.join()" is needed to ensure that all resources are released. It's a little bit surprising to have to emit a ResourceWarning if join() has not been called, even if the code uses "with pool:".
I don't know well the multiprocessing API, so I'm not sure in which directions we should go: best effort to support strange legacy code with "implicit object lifetime", or become stricter in Python 3.8?
From a technical point of view, I would prefer to become stricter.
Relying on the garbage collector means that the code is going to behave badly on PyPy which uses a different garbage collector implementation :-(
Victor

Hi,
On Tue, 11 Dec 2018 15:21:31 +0100 Victor Stinner vstinner@redhat.com wrote:
Pablo's issue35378 evolved to add a weak reference in iterators to try to detect when the Pool is destroyed: raise an exception from the iterator, if possible.
That's an ok fix for me.
By the way, I'm surprised that "with pool:" doesn't release all resources.
That's not a problem, as long as the destructor _does_ release resources.
From a technical point of view, I would prefer to become stricter.
Using "with pool:" is fine, we shouldn't start raising a warning for it.
What you are proposing here starts to smell like an anti-pattern to me. Python _is_ a garbage-collected language, so by definition, there _are_ going to be resources that are automatically collected when an object disappears. If I'm allocating a 2GB bytes object, then PyPy may delay the deallocation much longer than CPython. Do you propose we add a release() method to bytes objects to avoid this issue (and emit a warning for people who don't call release() on bytes objects)?
You can't change the language's philosophy. We warn about open files because those have user-visible consequences (such as unflushed buffers, or not being able to delete the file on Windows). If there is no user-visible consequence to not calling join() on a Pool, then we shouldn't warn about it.
Regards
Antoine.

Le mar. 11 déc. 2018 à 16:14, Antoine Pitrou solipsis@pitrou.net a écrit :
What you are proposing here starts to smell like an anti-pattern to me. Python _is_ a garbage-collected language, so by definition, there _are_ going to be resources that are automatically collected when an object disappears. If I'm allocating a 2GB bytes object, then PyPy may delay the deallocation much longer than CPython. Do you propose we add a release() method to bytes objects to avoid this issue (and emit a warning for people who don't call release() on bytes objects)?
We are not talking about simple strings, but processes and threads.
You can't change the language's philosophy. We warn about open files because those have user-visible consequences (such as unflushed buffers, or not being able to delete the file on Windows). If there is no user-visible consequence to not calling join() on a Pool, then we shouldn't warn about it.
"user-visible consequences" are that resources are kept alive longer than I would expect. When I use a context manager, I expect that Python will magically releases everything for me.
For example, "with subprocess.Popen() as popen: ..." ensures that all pipes are closed and the process completes, before we exit the block.
Another example, "with open() as fp: ..." ensures that the file descriptor is closed before we exit the block.
I modified subprocess.Popen.__del__() in Python 3.6 to emit a ResourceWarning if the subprocess is still running, to suggest the developer to explicitly manage the resource (ex: call .wait()).
I prefer to explicitly manager resources like processes and threads since they can exit with error: killed by a signal, waitpid() failure (exit status already read by a different function), etc. I prefer to control where the error occurs. I hate when Python logs strange error during shutdown. Logging errors during shutdown is too late: for example, the log triggers a new error because a stdlib module has been cleared. That's why we need hacks like "_warn=warnings.warn" below:
class Popen(object): ... def __del__(self, _maxsize=sys.maxsize, _warn=warnings.warn): ... if self.returncode is None: _warn("subprocess %s is still running" % self.pid, ResourceWarning, source=self) ...
Victor

On Tue, 11 Dec 2018 16:33:54 +0100 Victor Stinner vstinner@redhat.com wrote:
Le mar. 11 déc. 2018 à 16:14, Antoine Pitrou solipsis@pitrou.net a écrit :
What you are proposing here starts to smell like an anti-pattern to me. Python _is_ a garbage-collected language, so by definition, there _are_ going to be resources that are automatically collected when an object disappears. If I'm allocating a 2GB bytes object, then PyPy may delay the deallocation much longer than CPython. Do you propose we add a release() method to bytes objects to avoid this issue (and emit a warning for people who don't call release() on bytes objects)?
We are not talking about simple strings, but processes and threads.
Right, but do those have an impact on the program's correctness, or simply on its performance (or memory consumption)?
"user-visible consequences" are that resources are kept alive longer than I would expect. When I use a context manager, I expect that Python will magically releases everything for me.
I think there's a balancing act here: between "with pool" releasing everything, and not taking too much time to execute the __exit__ method. Currently, threads and processes may finish quietly between __exit__ and __del__, without adding significant latencies to your program's execution.
I prefer to explicitly manager resources like processes and threads since they can exit with error: killed by a signal, waitpid() failure (exit status already read by a different function), etc.
But multiprocessing.Pool manages them implicitly _by design_. People who want to manage processes explicitly can use the Process class directly ;-)
Regards
Antoine.

Le mar. 11 déc. 2018 à 17:06, Antoine Pitrou solipsis@pitrou.net a écrit :
We are not talking about simple strings, but processes and threads.
Right, but do those have an impact on the program's correctness, or simply on its performance (or memory consumption)?
Performance.
I made a similar change in the socketserver: server_close() now waits until child processes (ForkingMixIn) and threads (ThreadingMixIn) complete:
* https://bugs.python.org/issue31233 * https://bugs.python.org/issue31151
I added an opt-in option "block_on_close" to get Python 3.6 behavior on server_close():
https://bugs.python.org/issue33540
I don't know if these changes are similar to my questions about multiprocessing API, since socketserver didn't expose the list of processes/threads and doesn't provide a method to wait until they complete. Well... ForkingMixIn has an *undocumented* collect_children() which is non-blocking by default (I added a keyword-only 'blocking' parameter).
Another example: the close() method of an asyncio event loop doesn't wait until threads/processes complete:
"asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads" https://bugs.python.org/issue34037
It's unclear to me how this issue should be fixed.
Victor

On Tue, 11 Dec 2018 17:22:49 +0100 Victor Stinner vstinner@redhat.com wrote:
Le mar. 11 déc. 2018 à 17:06, Antoine Pitrou solipsis@pitrou.net a écrit :
We are not talking about simple strings, but processes and threads.
Right, but do those have an impact on the program's correctness, or simply on its performance (or memory consumption)?
Performance.
Well, at least we shouldn't emit ResourceWarning for performance issues. So if someone used "with pool:", they shouldn't get a ResourceWarning afterwards, even if some threads are still not finished running.
I don't know if these changes are similar to my questions about multiprocessing API, since socketserver didn't expose the list of processes/threads and doesn't provide a method to wait until they complete. Well... ForkingMixIn has an *undocumented* collect_children() which is non-blocking by default (I added a keyword-only 'blocking' parameter).
socketserver is not the same as a multiprocessing Pool *at all*. The usage will be vastly different between the two. Just because we did this change for socketserver is not enough of a reason to do it for multiprocessing too.
Another example: the close() method of an asyncio event loop doesn't wait until threads/processes complete:
"asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads" https://bugs.python.org/issue34037
It's not a leak though, it's a resource that's collected a bit later. It may annoy the test machinery (and this could be a reason to add private or public APIs to help finish the threads), but it shouldn't be an actual issue for user code.
Here as well, I think you should be careful not to introduce annoyances (unwanted latencies) in user code.
Regards
Antoine.

On Tue, 11 Dec 2018 at 15:13, Antoine Pitrou solipsis@pitrou.net wrote:
On Tue, 11 Dec 2018 15:21:31 +0100 Victor Stinner vstinner@redhat.com wrote:
Pablo's issue35378 evolved to add a weak reference in iterators to try to detect when the Pool is destroyed: raise an exception from the iterator, if possible.
That's an ok fix for me.
By the way, I'm surprised that "with pool:" doesn't release all resources.
That's not a problem, as long as the destructor _does_ release resources.
From a technical point of view, I would prefer to become stricter.
Using "with pool:" is fine, we shouldn't start raising a warning for it.
What you are proposing here starts to smell like an anti-pattern to me. Python _is_ a garbage-collected language, so by definition, there _are_ going to be resources that are automatically collected when an object disappears. If I'm allocating a 2GB bytes object, then PyPy may delay the deallocation much longer than CPython. Do you propose we add a release() method to bytes objects to avoid this issue (and emit a warning for people who don't call release() on bytes objects)?
You can't change the language's philosophy. We warn about open files because those have user-visible consequences (such as unflushed buffers, or not being able to delete the file on Windows). If there is no user-visible consequence to not calling join() on a Pool, then we shouldn't warn about it.
I agree with Antoine here.
On Tue, 11 Dec 2018 15:21:31 +0100 Victor Stinner vstinner@redhat.com wrote:
The API user has to *explicitly* release resources.
That's definitely not Python's philosophy. In Python, users should not have to worry about resource management themselves, that's the job of the language runtime. We provide the "with" construct so that when users *want* to manage resources explicitly (because there is an impact outside of the Python runtime's control, for example) then they can do so. But leaving resource management to the runtime is completely fine.
I propose to start to emit ResourceWarning in Python 3.8 when objects are not released explicitly.
Strong -1 on this.
I don't know well the multiprocessing API
Nor do I, but I'm against making fundamental design changes such as you propose *without* a deep understanding of the multiprocessing API. If you feel sufficiently strongly that the current design is wrong, then you should understand the principles and reasons behind that design before trying to change it.
Paul

On Tue, Dec 11, 2018, 07:13 Antoine Pitrou <solipsis@pitrou.net wrote:
What you are proposing here starts to smell like an anti-pattern to me. Python _is_ a garbage-collected language, so by definition, there _are_ going to be resources that are automatically collected when an object disappears. If I'm allocating a 2GB bytes object, then PyPy may delay the deallocation much longer than CPython. Do you propose we add a release() method to bytes objects to avoid this issue (and emit a warning for people who don't call release() on bytes objects)?
I know this question is rhetorical, but it does actually have a principled answer. Memory is a special resource, because the GC has a complete picture of memory use in the program, and if there's a danger of running out of memory then the GC will detect that and quickly run a collection before anyone has a chance to notice. But it doesn't know about other resources like descriptors, threads, processes, etc., so it can't detect or prevent unbounded leaks of these resources.
Therefore, in a classic GC-ed language, bytes() doesn't need to be explicitly released, but all other kinds of resources do. And according to the language spec, Python is a classic GC-ed language.
But things are complicated, because CPython isn't a classic GC-ed language, exactly. In practice it's a sort of hybrid RAII/GC language. People regularly write programs that on the refcount quick-release semantics for correctness. A quick way to check: the only thing a reference cycle does is make CPython start acting like an ordinary GC-ed language, so if you're worrying about reference cycles, that's a strong sign that you're writing CPython, not Python.
This puts libraries like multiprocessing in a tricky position, because some users are writing CPython, and some are writing Python, and the two groups have contradictory expectations for how resource management should be handled, yet somehow we have to make both groups happy.
I don't know what multiprocessing should do here, but I certainly admire the problem :-).
-n

On Tue, 11 Dec 2018 11:48:24 -0800 Nathaniel Smith njs@pobox.com wrote:
I know this question is rhetorical, but it does actually have a principled answer. Memory is a special resource, because the GC has a complete picture of memory use in the program, and if there's a danger of running out of memory then the GC will detect that and quickly run a collection before anyone has a chance to notice. But it doesn't know about other resources like descriptors, threads, processes, etc., so it can't detect or prevent unbounded leaks of these resources.
Therefore, in a classic GC-ed language, bytes() doesn't need to be explicitly released, but all other kinds of resources do.
I would disagree here. You may /like/ to release other kinds of resources explicitly, but you don't /need/ to. It is actually obvious for things such as mutexes which, while the GC doesn't know about them, are small system resources. And nobody's asking Python to add a method to deterministically destroy the mutex that's inside a Lock object (*). (also, calling Lock.release() already does something else :-))
Arguably, things are more complicated for things like threads and processes. But here we are talking not about threads and processes themselves, but about an abstraction (the Pool object) that is /designed/ to hide threads and processes in favour of higher level semantics organized around the idea of task submission. One important characteristic here is that, when the pool is idle, those threads and processes aren't holding important resources (user-allocated resources) alive (**). The idle pool just has a bookkeeping overhead.
Usually, people don't really care how exactly a Pool manages its helper threads and worker processes (it has both at the same time), and they are fine with the internal bookkeeping overhead. For the people who care (***), the Pool.join() method is there to be called.
(*) actually, it's not a mutex, it's a semaphore
(**) unless the user sets a global variable from an executing task, which I think of as an anti-pattern :-)
(***) for example because they used the anti-pattern above :-)
Regards
Antoine.
participants (4)
-
Antoine Pitrou
-
Nathaniel Smith
-
Paul Moore
-
Victor Stinner