On Oct 19, 2004, at 12:14, Tim Peters wrote:
True. That's one major problem for some apps. Another major problem for some apps is due to unbounded internal free lists outside of obmalloc. Another is that the platform OS+libc may not shrink VM at times even when memory is returned to the system free().
There is absolutely nothing I can do about that, however. On platforms that matter to me (Mac OS X, Linux) some number of large malloc() allocations are done via mmap(), and can be immediately released when free() is called. Hence, large blocks are reclaimable. I have no knowledge about the implementation of malloc() on Windows. Anyone care to enlighten me?
Another approach is to not free the memory, but instead to inform the operating system that the pages are unused (on Unix, madvise(2) with MADV_DONTNEED or MADV_FREE). When this happens, the operating system *may* discard the pages, but the address range remains valid: If it is touched again in the future, the OS will allocate the new page. This would require some dramatic changes to Python's internals.
if the memory usage has been relatively constant, and has been well below the amount of memory allocated.
That's a possible implementation strategy. I think you'll find it helpful to distinguish goals from implementations.
You are correct: This is an implementation detail. However, it is a relatively important one, as I do not want to change Python's aggressive memory recycling behaviour.
Maybe you just mean that you collapse adjacent free pools into a free pool of a larger size class, when possible? If so, that's a possible step on the way toward identifying unused arenas, but I wouldn't call it an instance of decreasing memory fragmentation.
I am not moving around Python objects, I'm just dealing with free pools and arenas in obmalloc.c at the moment. There two separate things I am doing:
1. Scan through the free pool list, and count the number of free pools in each arena. If an arena is completely unused, I free it. If there is even one pool in use, the arena cannot be freed.
2. Sorting the free pool list so that "nearly full" arenas are used before "nearly empty" arenas. Right now, when a pool is free, it is pushed on the list. When one is needed, it is popped off. This leads to an LRU allocation of memory. What I am doing is removing all the free pools from the list, and putting them back on so that areas that have more free pools are used later, while arenas with less free pools are used first.
In my crude tests, the second detail increases the number of completely free arenas. However, I suspect that differentiating between free arenas and used arenas, like is already done for pools, would be a good idea.
In apps with steady states, between steady-state transitions it's not a good idea to "artificially" collapse free pools into free pools of larger size, because the app is going to want to reuse pools of the specific sizes it frees, and obmalloc optimizes for that case.
Absolutely: I am not touching that. I'm working from the assumption that pymalloc has been well tested and well tuned and is appropriate for Python workloads. I'm just trying to make it free memory occasionally.
If the real point of this (whatever it is <wink>) is to identify free arenas, I expect that could be done a lot easier by keeping a count of allocated pools in each arena; e.g., maybe at the start of the arena, or by augmenting the vector of arena base addresses.
You are correct, and this is something I would like to play with. This is, of course, a tradeoff between overhead on each allocation and deallocation, and one big occasionally overhead caused by the "cleanup" process. I'm going to try and take a look at this tonight, if I get some real work done this afternoon.
But in some versions of reality, that isn't true. The best available explanation is in new_arena()'s long internal comment block: because of historical confusions about what Python's memory API *is*, it's possible that extension modules outside the core are incorrectly calling the obmalloc free() when they should be calling the system free(), and doing so without holding the GIL.
Let me just make sure I am clear on this: Some extensions use native threads, is that why this is a problem? Because as far as I am aware, the Python interpreter itself is not threaded. So how does the cyclical garbage collector work? Doesn't it require that there is no execution going on?
Now all such insane uses have been officially deprecated, so you could be bold and just assume obmalloc is always entered by a thread holding the GIL now.
I would rather not break this property of obmalloc. However, this leads to a big problem: I'm not sure it is possible to have an occasional cleanup task be lockless and co-operate nicely with other threads, since by definition it needs to go and mess with all the arenas. One of the reason that obmalloc *doesn't* have this problem is because it never releases memory.
It's only a waste if it ultimately fails <wink>.
It is also a waste if the core Python developers decide it is a bad idea, and don't want to accept patches! :)
Thanks for your feedback,
-- Evan Jones: http://evanjones.ca/ "Computers are useless. They can only give answers" - Pablo Picasso