Re: [Python-Dev] cpython: Issue #3329: Add new APIs to customize memory allocators
On Sat, 15 Jun 2013 00:44:11 +0200 (CEST) victor.stinner <python-checkins@python.org> wrote:
http://hg.python.org/cpython/rev/6661a8154eb3 changeset: 84127:6661a8154eb3 user: Victor Stinner <victor.stinner@gmail.com> date: Sat Jun 15 00:37:46 2013 +0200 summary: Issue #3329: Add new APIs to customize memory allocators
* Add a new PyMemAllocators structure * New functions:
- PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree(): GIL-free memory allocator functions - PyMem_GetRawAllocators(), PyMem_SetRawAllocators() - PyMem_GetAllocators(), PyMem_SetAllocators() - PyMem_SetupDebugHooks() - _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators()
My two cents, but I would prefer if this whole changeset was reverted. I think it adds too much complexity in the memory allocation APIs, for a pretty specialized benefit. IMHO, we should be able to get by with less allocation APIs (why the new _Raw APIs) and less hook-setting functions. Regards Antoine.
2013/6/15 Antoine Pitrou <solipsis@pitrou.net>:
http://hg.python.org/cpython/rev/6661a8154eb3 ... Issue #3329: Add new APIs to customize memory allocators
* Add a new PyMemAllocators structure * New functions:
- PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree(): GIL-free memory allocator functions - PyMem_GetRawAllocators(), PyMem_SetRawAllocators() - PyMem_GetAllocators(), PyMem_SetAllocators() - PyMem_SetupDebugHooks() - _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators()
My two cents, but I would prefer if this whole changeset was reverted. I think it adds too much complexity in the memory allocation APIs, for a pretty specialized benefit. IMHO, we should be able to get by with less allocation APIs (why the new _Raw APIs) and less hook-setting functions.
Ok, I reverted my commit. I posted my initial patch 3 months ago on the bug tracker. I got some reviews and discussed with Kristján Valur Jónsson who heavily modified Python for his game at CCP. I started two threads on python-dev this week (ok, only two days ago). I thaugh that the last known issues were fixed with the addition of PyMem_SetupDebugHooks() (to avoid an environment variable, as asked by Nick) and PyMem_RawMalloc() (have a GIL-free allocator). I will work on a PEP to explain all these new functions and their use cases. ** The addition of PyMem_RawMalloc() is motivated by the issue #18203 (Replace calls to malloc() with PyMem_Malloc()). The goal is to be able to setup a custom allocator for *all* allocation made by Python, so malloc() should not be called directly. PyMem_RawMalloc() is required in places where the GIL is not held (ex: in os.getcwd() on Windows). PyMem_Malloc() is misused (called without the GIL held) in different places. Examples: the readline modules and functions called at Python startup, including main(). Replacing PyMem_Malloc() with malloc() would not allow to use the custom allocator everywhere, so PyMem_RawMalloc() is also required here. The last point is an extension of the issue #18203: some external libraries like zlib or OpenSSL are also calling malloc() directly. But Python can configure these libraries to use a custom memory allocator. I plan to configure external libraries to use PyMem_GetRawAllocators() if PyMem_SetRawAllocators() was called (if PyMem_RawMalloc is not simply malloc) and if setting a custom allocator only affect a function and not the whole library. Victor
On 15 June 2013 11:54, Victor Stinner <victor.stinner@gmail.com> wrote:
2013/6/15 Antoine Pitrou <solipsis@pitrou.net>:
http://hg.python.org/cpython/rev/6661a8154eb3 ... Issue #3329: Add new APIs to customize memory allocators
* Add a new PyMemAllocators structure * New functions:
- PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree(): GIL-free memory allocator functions - PyMem_GetRawAllocators(), PyMem_SetRawAllocators() - PyMem_GetAllocators(), PyMem_SetAllocators() - PyMem_SetupDebugHooks() - _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators()
My two cents, but I would prefer if this whole changeset was reverted. I think it adds too much complexity in the memory allocation APIs, for a pretty specialized benefit. IMHO, we should be able to get by with less allocation APIs (why the new _Raw APIs) and less hook-setting functions.
Ok, I reverted my commit.
I posted my initial patch 3 months ago on the bug tracker. I got some reviews and discussed with Kristján Valur Jónsson who heavily modified Python for his game at CCP. I started two threads on python-dev this week (ok, only two days ago). I thaugh that the last known issues were fixed with the addition of PyMem_SetupDebugHooks() (to avoid an environment variable, as asked by Nick) and PyMem_RawMalloc() (have a GIL-free allocator).
I will work on a PEP to explain all these new functions and their use cases.
I think the new APIs are mostly valid and well-justified, but agree a PEP is a good idea. Yes, it's a complex solution, but it's solving a complex problem that arises when embedding CPython inside executables that need to run on non-traditional platforms where the simple C defined malloc/realloc/free trio is inadequate. This is a complementary effort to PEP 432 - that aims to simplify embedding CPython in general, while Victor's efforts here specifically focus on situations where it is necessary to better map CPython to an underlying memory model that differs from the traditional C one. While the "single heap" model of memory enshrined in the C standard is certainly the most common model, it's far from being the only one, and these days CPython also gets used in those other environments. About the only simplification I can see is that PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree() could perhaps be handled with preprocessor macros instead of permitting runtime reconfiguration. Allowing the memory allocations for the CPython runtime to be handled separately from those for arbitrary C libraries loaded into the process seems reasonable, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
fwiw i'm also supportive of adding these apis. Lets PEP away to iron out any details or document disagreements but overall I'd also like to see something a lot like these go in. -gps On Fri, Jun 14, 2013 at 10:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 15 June 2013 11:54, Victor Stinner <victor.stinner@gmail.com> wrote:
2013/6/15 Antoine Pitrou <solipsis@pitrou.net>:
http://hg.python.org/cpython/rev/6661a8154eb3 ... Issue #3329: Add new APIs to customize memory allocators
* Add a new PyMemAllocators structure * New functions:
- PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree(): GIL-free memory allocator functions - PyMem_GetRawAllocators(), PyMem_SetRawAllocators() - PyMem_GetAllocators(), PyMem_SetAllocators() - PyMem_SetupDebugHooks() - _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators()
My two cents, but I would prefer if this whole changeset was reverted. I think it adds too much complexity in the memory allocation APIs, for a pretty specialized benefit. IMHO, we should be able to get by with less allocation APIs (why the new _Raw APIs) and less hook-setting functions.
Ok, I reverted my commit.
I posted my initial patch 3 months ago on the bug tracker. I got some reviews and discussed with Kristján Valur Jónsson who heavily modified Python for his game at CCP. I started two threads on python-dev this week (ok, only two days ago). I thaugh that the last known issues were fixed with the addition of PyMem_SetupDebugHooks() (to avoid an environment variable, as asked by Nick) and PyMem_RawMalloc() (have a GIL-free allocator).
I will work on a PEP to explain all these new functions and their use cases.
I think the new APIs are mostly valid and well-justified, but agree a PEP is a good idea.
Yes, it's a complex solution, but it's solving a complex problem that arises when embedding CPython inside executables that need to run on non-traditional platforms where the simple C defined malloc/realloc/free trio is inadequate.
This is a complementary effort to PEP 432 - that aims to simplify embedding CPython in general, while Victor's efforts here specifically focus on situations where it is necessary to better map CPython to an underlying memory model that differs from the traditional C one. While the "single heap" model of memory enshrined in the C standard is certainly the most common model, it's far from being the only one, and these days CPython also gets used in those other environments.
About the only simplification I can see is that PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree() could perhaps be handled with preprocessor macros instead of permitting runtime reconfiguration. Allowing the memory allocations for the CPython runtime to be handled separately from those for arbitrary C libraries loaded into the process seems reasonable, though.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
On Sat, 15 Jun 2013 03:54:50 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
The addition of PyMem_RawMalloc() is motivated by the issue #18203 (Replace calls to malloc() with PyMem_Malloc()). The goal is to be able to setup a custom allocator for *all* allocation made by Python, so malloc() should not be called directly. PyMem_RawMalloc() is required in places where the GIL is not held (ex: in os.getcwd() on Windows).
We already had this discussion on IRC and this argument isn't very convincing to me. If os.getcwd() doesn't hold the GIL while allocating memory, then you should fix it to hold the GIL while allocating memory. I don't like the idea of adding of third layer of allocation APIs. The dichotomy between PyObject_Malloc and PyMem_Malloc is already a bit gratuitous (i.e. not motivated by any actual real-world concern, as far as I can tell). As for the debug functions you added: PyMem_GetRawAllocators(), PyMem_SetRawAllocators(), PyMem_GetAllocators(), PyMem_SetAllocators(), PyMem_SetupDebugHooks(), _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators(). Well, do we need all *7* of them? Can't you try to make that 2 or 3? Regards Antoine.
On 15 June 2013 21:01, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 15 Jun 2013 03:54:50 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
The addition of PyMem_RawMalloc() is motivated by the issue #18203 (Replace calls to malloc() with PyMem_Malloc()). The goal is to be able to setup a custom allocator for *all* allocation made by Python, so malloc() should not be called directly. PyMem_RawMalloc() is required in places where the GIL is not held (ex: in os.getcwd() on Windows).
We already had this discussion on IRC and this argument isn't very convincing to me. If os.getcwd() doesn't hold the GIL while allocating memory, then you should fix it to hold the GIL while allocating memory.
I don't like the idea of adding of third layer of allocation APIs. The dichotomy between PyObject_Malloc and PyMem_Malloc is already a bit gratuitous (i.e. not motivated by any actual real-world concern, as far as I can tell).
The only reason for the small object allocator to exist is because operating system allocators generally aren't optimised for frequent allocation and deallocation of small objects. You can gain a *lot* of speed from handling those inside the application. As the allocations grow in size, though, the application level allocator just becomes useless overhead, so it's better to delegate those operations directly to the OS. However, it's still desirable to be able to monitor those direct allocations in debug mode, thus it makes sense to have a GIL protected direct allocation API as well. You could try to hide the existence of the latter behaviour and treat it as a private API, but why? For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
As for the debug functions you added: PyMem_GetRawAllocators(), PyMem_SetRawAllocators(), PyMem_GetAllocators(), PyMem_SetAllocators(), PyMem_SetupDebugHooks(), _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators(). Well, do we need all *7* of them? Can't you try to make that 2 or 3?
Faux simplicity that is achieved only by failing to model a complex problem domain correctly is a bad idea (if we were satisfied with that, we could stick with the status quo). The only question mark in my mind is over the GIL-free raw allocation APIs. I think it makes sense to at least conditionally define those as macros so an embedding application can redirect *just* the allocations made by the CPython runtime (rather than having to redefine malloc/realloc/free when building Python), but I don't believe the case has been adequately made for making the raw APIs configurable at runtime. Dropping that aspect would at least eliminate the PyMem_(Get|Set)RawAllocators() APIs. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 15 Jun 2013 22:22:33 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 15 June 2013 21:01, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 15 Jun 2013 03:54:50 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
The addition of PyMem_RawMalloc() is motivated by the issue #18203 (Replace calls to malloc() with PyMem_Malloc()). The goal is to be able to setup a custom allocator for *all* allocation made by Python, so malloc() should not be called directly. PyMem_RawMalloc() is required in places where the GIL is not held (ex: in os.getcwd() on Windows).
We already had this discussion on IRC and this argument isn't very convincing to me. If os.getcwd() doesn't hold the GIL while allocating memory, then you should fix it to hold the GIL while allocating memory.
I don't like the idea of adding of third layer of allocation APIs. The dichotomy between PyObject_Malloc and PyMem_Malloc is already a bit gratuitous (i.e. not motivated by any actual real-world concern, as far as I can tell).
The only reason for the small object allocator to exist is because operating system allocators generally aren't optimised for frequent allocation and deallocation of small objects. You can gain a *lot* of speed from handling those inside the application. As the allocations grow in size, though, the application level allocator just becomes useless overhead, so it's better to delegate those operations directly to the OS.
The small object allocator *already* delegates those operations directly to the OS. You don't need a separate API to do it by hand.
For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
Which custom allocators?
As for the debug functions you added: PyMem_GetRawAllocators(), PyMem_SetRawAllocators(), PyMem_GetAllocators(), PyMem_SetAllocators(), PyMem_SetupDebugHooks(), _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators(). Well, do we need all *7* of them? Can't you try to make that 2 or 3?
Faux simplicity that is achieved only by failing to model a complex problem domain correctly is a bad idea (if we were satisfied with that, we could stick with the status quo).
Actually, I'm sure almost everyone *is* satisfied with the status quo here (witness the total absence of bug reports on the matter). Victor's patch addresses a rare concern compared to the common use cases of CPython. And I'm not even sure what "faux simplicity" you are talking about. What is the supposed complexity that this API is supposed to address? Why do we need two different pairs of hook-setting functions, rather than letting each function set / get all hooks at once? And why the private API functions for setting arena allocators? Memory allocation APIs are a fundamental part of the C API that many extension writers have to understand and deal with. I'm opposed to gratuitous complication when the use cases are not compelling. Regards Antoine.
On 15 June 2013 22:41, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 15 Jun 2013 22:22:33 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
Which custom allocators?
Those used by companies like Dropbox to speed up frequent allocations (look up their PyCon 2011 keynote). If we don't provide suitable APIs that we can still hook in debug mode, they'll bypass our infrastructure completely and we'll miss significant memory accesses.
As for the debug functions you added: PyMem_GetRawAllocators(), PyMem_SetRawAllocators(), PyMem_GetAllocators(), PyMem_SetAllocators(), PyMem_SetupDebugHooks(), _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators(). Well, do we need all *7* of them? Can't you try to make that 2 or 3?
Faux simplicity that is achieved only by failing to model a complex problem domain correctly is a bad idea (if we were satisfied with that, we could stick with the status quo).
Actually, I'm sure almost everyone *is* satisfied with the status quo here (witness the total absence of bug reports on the matter). Victor's patch addresses a rare concern compared to the common use cases of CPython.
Indeed, but they're use cases I care about, Victor cares about, Kristjan cares about, Greg cares about. It's OK that you don't care about them, just as 99% of the Python programmers on the planet won't care about PEP 432 or the various arcane metaclass changes we've made over the years. issue 3329 (the one where Victor implemented this) was actually filed by the folks working on the Symbian port. The last comment on that issue before Victor restarted was from you, in reply to someone asking if we had implemented it yet.
And I'm not even sure what "faux simplicity" you are talking about. What is the supposed complexity that this API is supposed to address?
The fact that there is more to the world than x86/x86_64 and the very simplistic C memory model. Python is growing more popular in non-traditional execution environments, and we finally have someone (Victor) interested in doing the work to support them properly. That should be celebrated, not blocked because it isn't meaningful for the more common systems where the C memory model is fine.
Why do we need two different pairs of hook-setting functions, rather than letting each function set / get all hooks at once?
I've already said I don't think the raw allocators should be configurable at runtime. The other is because it's likely people will only want to replace the lower level allocators and leave the small object allocator alone. However, they should be able to completely replace the small object allocator if they want. Making the more common case more complicated to avoid adding a more appropriate two level API is the kind of thing I mean by "faux simplicity" - it's almost certainly going to be harder to use in practice, so trading multiple functions for fewer functions each taking more parameters isn't actually a win.
And why the private API functions for setting arena allocators?
Because they're in a different compilation unit...
Memory allocation APIs are a fundamental part of the C API that many extension writers have to understand and deal with. I'm opposed to gratuitous complication when the use cases are not compelling.
That's a documentation problem. C extension authors shouldn't be touching these, and most people embedding CPython shouldn't be touching them either. They're the C level equivalent of metaclasses: if you're not sure you need them, you don't need them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sun, 16 Jun 2013 00:12:02 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 15 June 2013 22:41, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 15 Jun 2013 22:22:33 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
Which custom allocators?
Those used by companies like Dropbox to speed up frequent allocations (look up their PyCon 2011 keynote). If we don't provide suitable APIs that we can still hook in debug mode, they'll bypass our infrastructure completely and we'll miss significant memory accesses.
I don't understand the concern. People can ignore the Python allocators, and then use their own debugging infrastructure. This is what happens everytime you want to use your own infrastructure instead of a 3rd party-provided one.
And I'm not even sure what "faux simplicity" you are talking about. What is the supposed complexity that this API is supposed to address?
The fact that there is more to the world than x86/x86_64 and the very simplistic C memory model.
Then I think it needs a PEP to properly address it and explain it to everyone. Moreover, I think you are conflating two issues: the ability to add memory allocation hooks (for tracing/debugging purposes), and the adaptation to "non-traditional" memory models (whatever that means). Those concerns don't necessarily come together. Regards Antoine.
2013/6/15 Antoine Pitrou <solipsis@pitrou.net>:
Moreover, I think you are conflating two issues: the ability to add memory allocation hooks (for tracing/debugging purposes), and the adaptation to "non-traditional" memory models (whatever that means). Those concerns don't necessarily come together.
In my implementation, both uses case use the same API: PyMem_SetAllocators(), except that hooks need also PyMem_GetAllocators(). Victor
On 16 Jun 2013 10:54, "Victor Stinner" <victor.stinner@gmail.com> wrote:
2013/6/15 Antoine Pitrou <solipsis@pitrou.net>:
Moreover, I think you are conflating two issues: the ability to add memory allocation hooks (for tracing/debugging purposes), and the adaptation to "non-traditional" memory models (whatever that means). Those concerns don't necessarily come together.
In my implementation, both uses case use the same API: PyMem_SetAllocators(), except that hooks need also PyMem_GetAllocators().
Right - they're different use cases that share a technical solution, so it makes sense to consider them together. Cheers, Nick.
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
Am 15.06.2013 14:22, schrieb Nick Coghlan:
However, it's still desirable to be able to monitor those direct allocations in debug mode, thus it makes sense to have a GIL protected direct allocation API as well. You could try to hide the existence of the latter behaviour and treat it as a private API, but why? For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
There is even more to it. We like to keep track of memory allocations in libraries that are wrapped by Python's extension modules, e.g. expat, openssl etc. Almost every library has a hook to set a custom memory allocator, either globally (CRYPTO_set_mem_functions) or for each object (XML_ParserCreate_MM's XML_Memory_Handling_Suite). Python releases the GIL around IO or CPU critical sections of the library. But these sections may call the memory management functions. If these memory functions use the GIL, then some speed ups of releasing the GIL in the first place are lost. It might even be possible to walk into dead lock situations. For that reason it makes sense to have a set of low level memory management functions, that don't rely on the GIL for locking. These functions can still impose their own locking if they need to modify a global state (e.g. allocation statistics). In normal release mode and on most platforms the raw memory allocators should be thin wrappers around malloc(), realloc() and free() -- or perhaps just macros. Eventually I would like to ban direct usage of malloc() from Python's core and patch all memory management through our API. Christian
2013/6/15 Christian Heimes <christian@python.org>:
Am 15.06.2013 14:22, schrieb Nick Coghlan:
However, it's still desirable to be able to monitor those direct allocations in debug mode, thus it makes sense to have a GIL protected direct allocation API as well. You could try to hide the existence of the latter behaviour and treat it as a private API, but why? For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
There is even more to it. We like to keep track of memory allocations in libraries that are wrapped by Python's extension modules, e.g. expat, openssl etc. Almost every library has a hook to set a custom memory allocator, either globally (CRYPTO_set_mem_functions) or for each object (XML_ParserCreate_MM's XML_Memory_Handling_Suite).
I just create the issue http://bugs.python.org/issue18227: "Use Python memory allocators in external libraries like zlib or OpenSSL". Is it possible to detect if Python is used as a standalone application (the classic "python" program) or if Python is embedded? If it is possible, we can modify the "global" memory allocators of a library. Otherwise, it is more tricky. Should Python sets its "own" memory allocators? Maybe only if PyMem_SetRawAllocators() was called?
Eventually I would like to ban direct usage of malloc() from Python's core and patch all memory management through our API.
I already create issue http://bugs.python.org/issue18203 for this part. Victor
On Sun, 16 Jun 2013 01:48:06 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
2013/6/15 Christian Heimes <christian@python.org>:
Am 15.06.2013 14:22, schrieb Nick Coghlan:
However, it's still desirable to be able to monitor those direct allocations in debug mode, thus it makes sense to have a GIL protected direct allocation API as well. You could try to hide the existence of the latter behaviour and treat it as a private API, but why? For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
There is even more to it. We like to keep track of memory allocations in libraries that are wrapped by Python's extension modules, e.g. expat, openssl etc. Almost every library has a hook to set a custom memory allocator, either globally (CRYPTO_set_mem_functions) or for each object (XML_ParserCreate_MM's XML_Memory_Handling_Suite).
I just create the issue http://bugs.python.org/issue18227: "Use Python memory allocators in external libraries like zlib or OpenSSL".
Is it possible to detect if Python is used as a standalone application (the classic "python" program) or if Python is embedded? If it is possible, we can modify the "global" memory allocators of a library.
The question is why you want to do so, not how/whether to do it. Regards Antoine.
On 16 June 2013 19:50, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Jun 2013 01:48:06 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
2013/6/15 Christian Heimes <christian@python.org>:
Am 15.06.2013 14:22, schrieb Nick Coghlan:
However, it's still desirable to be able to monitor those direct allocations in debug mode, thus it makes sense to have a GIL protected direct allocation API as well. You could try to hide the existence of the latter behaviour and treat it as a private API, but why? For custom allocators, it's useful to be able to *ensure* you can bypass CPython's small object allocator, rather than having to rely on it being bypassed for allocations above a certain size.
There is even more to it. We like to keep track of memory allocations in libraries that are wrapped by Python's extension modules, e.g. expat, openssl etc. Almost every library has a hook to set a custom memory allocator, either globally (CRYPTO_set_mem_functions) or for each object (XML_ParserCreate_MM's XML_Memory_Handling_Suite).
I just create the issue http://bugs.python.org/issue18227: "Use Python memory allocators in external libraries like zlib or OpenSSL".
Is it possible to detect if Python is used as a standalone application (the classic "python" program) or if Python is embedded? If it is possible, we can modify the "global" memory allocators of a library.
The question is why you want to do so, not how/whether to do it.
I don't think we should be doing that ourselves - it's up to system integrators/embedders to configure those libraries if they want to, we shouldn't be doing it on their behalf. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
2013/6/16 Antoine Pitrou <solipsis@pitrou.net>:
On Sun, 16 Jun 2013 01:48:06 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
I just create the issue http://bugs.python.org/issue18227: "Use Python memory allocators in external libraries like zlib or OpenSSL".
Is it possible to detect if Python is used as a standalone application (the classic "python" program) or if Python is embedded? If it is possible, we can modify the "global" memory allocators of a library.
The question is why you want to do so, not how/whether to do it.
I want to be able to track the memory usage of all Python memory, even in external libraries, and/or use a custom memory allocator, even in external libraries. Victor
2013/6/15 Nick Coghlan <ncoghlan@gmail.com>:
The only reason for the small object allocator to exist is because operating system allocators generally aren't optimised for frequent allocation and deallocation of small objects. You can gain a *lot* of speed from handling those inside the application. As the allocations grow in size, though, the application level allocator just becomes useless overhead, so it's better to delegate those operations directly to the OS.
Why not using PyObject_Malloc() for all allocations? PyObject_Malloc() fallbacks to malloc() if the size is larger than a threshold (512 bytes in Python 3.4). Are PyObject_Realloc() and PyObject_Free() more expensive than realloc() and free() (when the memory was allocated by malloc)?
The only question mark in my mind is over the GIL-free raw allocation APIs. I think it makes sense to at least conditionally define those as macros so an embedding application can redirect *just* the allocations made by the CPython runtime (rather than having to redefine malloc/realloc/free when building Python), but I don't believe the case has been adequately made for making the raw APIs configurable at runtime. Dropping that aspect would at least eliminate the PyMem_(Get|Set)RawAllocators() APIs.
PyMem_SetRawAllocators() is required for the two use cases: use a custom memory allocator (embedded device and Python embedded in an application) and setup an hook for debug purpose. Without PyMem_SetRawAllocators(), allocations made by PyMem_RawMalloc() would go to the same place than the rest of the "Python memory", nor seen by debug tools. It becomes worse with large allocations kept for a long time. Victor
2013/6/15 Antoine Pitrou <solipsis@pitrou.net>:
On Sat, 15 Jun 2013 03:54:50 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
The addition of PyMem_RawMalloc() is motivated by the issue #18203 (Replace calls to malloc() with PyMem_Malloc()). The goal is to be able to setup a custom allocator for *all* allocation made by Python, so malloc() should not be called directly. PyMem_RawMalloc() is required in places where the GIL is not held (ex: in os.getcwd() on Windows).
We already had this discussion on IRC and this argument isn't very convincing to me. If os.getcwd() doesn't hold the GIL while allocating memory, then you should fix it to hold the GIL while allocating memory.
The GIL is released for best performances, holding the GIL would have an impact on performances. PyMem_RawMalloc() is needed when PyMem_Malloc() cannot be used because the GIL was released. For example, for the issue #18227 (reuse the custom allocator in external libraries), PyMem_Malloc() is usually not appropriate. PyMem_RawMalloc() should also be used instead of PyMem_Malloc() in the Python startup sequence, because PyMem_Malloc() requires the GIL whereas the GIL does not exist yet. PyMem_RawMalloc() also provides more accurate memory usage if it can be replaced or hooked (with PyMem_SetRawAllocators). The issue #18203 explains why I would like to replace direct call to malloc() with PyMem_Malloc() or PyMem_RawMalloc().
I don't like the idea of adding of third layer of allocation APIs. The dichotomy between PyObject_Malloc and PyMem_Malloc is already a bit gratuitous (i.e. not motivated by any actual real-world concern, as far as I can tell).
In Python 3.3, PyMem_Malloc() cannot be used instead of malloc() where the GIL is not held. Instead of adding PyMem_RawMalloc(), an alternative is to remove the "the GIL must be held" restriction from PyMem_Malloc() by changing PyMem_Malloc() to make it always call malloc() (instead of PyObject_Malloc() in debug mode). With such change, a debug hook cannot rely on the GIL anymore: it cannot inspect Python objects, get a frame or traceback, etc. To still get accurate debug report, PyMem_Malloc() should be replaced with PyObject_Malloc(). I don't understand yet the effect of such change on backport compatibility. May it break applications?
As for the debug functions you added: PyMem_GetRawAllocators(), PyMem_SetRawAllocators(), PyMem_GetAllocators(), PyMem_SetAllocators(), PyMem_SetupDebugHooks(), _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators(). Well, do we need all *7* of them? Can't you try to make that 2 or 3?
Get/SetAllocators of PyMem, PyMem_Raw and PyObject can be grouped into 2 functions (get and set) with an argument to select the API. It is what I proposed initially. I changed this when I had to choose a name for the name of the argument ("api", "domain", something else?) because there were only two choices. With 3 family of functions (PyMem, PyMem_Raw and PyObject), it becomes again interesting to have generic functions. The arena case is different: pymalloc only uses two functions to allocate areneas: void* alloc(size_t) and void release(void*, size_t). The release function has a size argument, which is unusual, but require to implement it using munmap(). VirtualFree() on Windows requires also the size. An application can choose to replace PyObject_Malloc() with its own allocator, but in my experience, it has an important impact on performance (Python is slower). To benefit of pymalloc with a custom memory allocator, _PyObject_SetArenaAllocators() can be used. I kept _PyObject_SetArenaAllocators() private because I don't like its API, it is not homogenous with the other SetAllocators functions. I'm not sure that it would be used, so I prefer to keep it private until it is tested by some projects. "Private" functions can be used by applications, it's just that Python doesn't give any backward compatibility warranty. Am I right? Victor
On Sun, 16 Jun 2013 02:18:32 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
2013/6/15 Antoine Pitrou <solipsis@pitrou.net>:
On Sat, 15 Jun 2013 03:54:50 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
The addition of PyMem_RawMalloc() is motivated by the issue #18203 (Replace calls to malloc() with PyMem_Malloc()). The goal is to be able to setup a custom allocator for *all* allocation made by Python, so malloc() should not be called directly. PyMem_RawMalloc() is required in places where the GIL is not held (ex: in os.getcwd() on Windows).
We already had this discussion on IRC and this argument isn't very convincing to me. If os.getcwd() doesn't hold the GIL while allocating memory, then you should fix it to hold the GIL while allocating memory.
The GIL is released for best performances, holding the GIL would have an impact on performances.
Well, do you have benchmark numbers? Do you have a workload where getcwd() is performance-critical to the point that a single GIL-protected allocation may slow down your program?
"Private" functions can be used by applications, it's just that Python doesn't give any backward compatibility warranty. Am I right?
Anyone "can" use anything obviously, but when it's private, it can be changed or removed in any release. If the only goal for these functions is to be used by applications, though, it's quite a bad idea to make them private. Regards Antoine.
Le 15 juin 2013 03:54, "Victor Stinner" <victor.stinner@gmail.com<javascript:_e({}, 'cvml', 'victor.stinner@gmail.com');>> a écrit :
Ok, I reverted my commit.
I will work on a PEP to explain all these new functions and their use cases.
I created the PEP 445 to reserve the number. It is ready for a review but already contains some explanation of the new API. http://www.python.org/dev/peps/pep-0445/ Victor
Am 15.06.2013 14:57, schrieb Victor Stinner:
Le 15 juin 2013 03:54, "Victor Stinner" <victor.stinner@gmail.com <javascript:_e({}, 'cvml', 'victor.stinner@gmail.com');>> a écrit :
Ok, I reverted my commit.
I will work on a PEP to explain all these new functions and their use cases.
I created the PEP 445 to reserve the number. It is ready for a review but already containssome explanation of the new API.
+1 How about you compare your approach with some other libraries, too? Other projects have dealt with similar issues in the past, too. libxml2 has an extensive API for memory management and debugging. I have assembled a list of APIs for you: OpenSSL has CRYPTO_set_mem_functions() to set memory management functions globally http://git.openssl.org/gitweb/?p=openssl.git;a=blob;f=crypto/mem.c;h=f7984fa... expat has a per-instance memory handler: http://hg.python.org/cpython/file/cc27d50bd91a/Modules/expat/xmlparse.c#l717 libtiff has three global hooks _TIFFmalloc(), _TIFFrealloc() and _TIFFfree() that are used instead of malloc() in its core. http://trac.imagemagick.org/browser/tiff/trunk/libtiff/tif_unix.c#L258 libxml2 has http://xmlsoft.org/html/libxml-xmlmemory.html Christian
participants (5)
-
Antoine Pitrou
-
Christian Heimes
-
Gregory P. Smith
-
Nick Coghlan
-
Victor Stinner