Hi, I agree that embedding Python is an important use case and that we should try to leak less memory and better isolate multiple interpreters for this use case. There are multiple projects to enhance code to make it work better with multiple interpreters: * convert C extension modules to multiphase initialization (PEP 489) * move C extension module global variables (static ...) into a module state * convert static types to heap types * make free lists per interpreter * etc. From what I saw, the first side effect is that "suddenly", tests using subinterpreters start to report new reference leaks. Examples of issues and fixes: * https://github.com/python/cpython/commit/18a90248fdd92b27098cc4db773686a2d10...: reference leak in the init function of the select module * https://github.com/python/cpython/commit/310e2d25170a88ef03f6fd31efcc899fe06...: reference cycles with encodings and _testcapi misuses PyModule_AddObject() * https://bugs.python.org/issue40050: _weakref and importlib * etc. In fact, none of these bugs is not new. I checked for a few: bugs were always there. It's just that previously, nobody paid attention to these leaks. Fixing subinterpreters helps to leak less memory even for the single interpreter (embed Python) use case. The problem is that Python never tried to clear everything at exit. One way to see the issue is the number of references at exit using a debug build, on the up-to-date master branch: $ ./python -X showrefcount -c pass [18645 refs, 6141 blocks] Python leaks 18,645 references at exit. Some of the work that I listed is tracked by https://bugs.python.org/issue1635741 which was created in 2007: "Py_Finalize() doesn't clear all Python objects at exit". Another way to see the issue is: $ PYTHONMALLOC=malloc valgrind ./python -c pass (...) ==169747== LEAK SUMMARY: ==169747== definitely lost: 48 bytes in 2 blocks ==169747== indirectly lost: 136 bytes in 6 blocks ==169747== possibly lost: 700,552 bytes in 5,677 blocks ==169747== still reachable: 5,450 bytes in 48 blocks ==169747== suppressed: 0 bytes in 0 blocks Python leaks around 700 KB at exit. Even if you ignore the "run multiple interpreters in parallel" and PEP 554 use cases, enhancing code to better work with subinterpreters also makes Python a better library to embed in applications and so is useful. Victor Le mer. 10 juin 2020 à 04:46, Inada Naoki <songofacandy@gmail.com> a écrit :
On Tue, Jun 9, 2020 at 10:28 PM Petr Viktorin <encukou@gmail.com> wrote:
Relatively recently, there is an effort to expose interpreter creation & finalization from Python code, and also to allow communication between them (starting with something rudimentary, sharing buffers). There is also a push to explore making the GIL per-interpreter, which ties in to moving away from process-global state. Both are interesting ideas, but (like banishing global state) not the whole motivation for changes/additions.
Some changes for per interpreter GIL doesn't help sub interpreters so much. For example, isolating memory allocator including free list and constants between sub interpreter makes sub interpreter fatter. I assume Mark is talking about such changes.
Now Victor proposing move dict free list per interpreter state and the code looks good to me. This is a change for per interpreter GIL, but not for sub interpreters. https://github.com/python/cpython/pull/20645
Should we commit this change to the master branch? Or should we create another branch for such changes?
Regards, -- Inada Naoki <songofacandy@gmail.com> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/L7JRFJLD... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.