[Python-Dev] Changing pymalloc behaviour for long running
ejones at uwaterloo.ca
Tue Oct 19 19:25:28 CEST 2004
On Oct 19, 2004, at 12:14, Tim Peters wrote:
> True. That's one major problem for some apps. Another major problem
> for some apps is due to unbounded internal free lists outside of
> obmalloc. Another is that the platform OS+libc may not shrink VM at
> times even when memory is returned to the system free().
There is absolutely nothing I can do about that, however. On platforms
that matter to me (Mac OS X, Linux) some number of large malloc()
allocations are done via mmap(), and can be immediately released when
free() is called. Hence, large blocks are reclaimable. I have no
knowledge about the implementation of malloc() on Windows. Anyone care
to enlighten me?
Another approach is to not free the memory, but instead to inform the
operating system that the pages are unused (on Unix, madvise(2) with
MADV_DONTNEED or MADV_FREE). When this happens, the operating system
*may* discard the pages, but the address range remains valid: If it is
touched again in the future, the OS will allocate the new page. This
would require some dramatic changes to Python's internals.
>> if the memory usage has been relatively constant, and has been well
>> below the amount of memory allocated.
> That's a possible implementation strategy. I think you'll find it
> helpful to distinguish goals from implementations.
You are correct: This is an implementation detail. However, it is a
relatively important one, as I do not want to change Python's
aggressive memory recycling behaviour.
> Maybe you just mean that you collapse adjacent free pools into a free
> pool of a larger size class, when possible? If so, that's a possible
> step on the way toward identifying unused arenas, but I wouldn't call
> it an instance of decreasing memory fragmentation.
I am not moving around Python objects, I'm just dealing with free pools
and arenas in obmalloc.c at the moment. There two separate things I am
1. Scan through the free pool list, and count the number of free pools
in each arena. If an arena is completely unused, I free it. If there is
even one pool in use, the arena cannot be freed.
2. Sorting the free pool list so that "nearly full" arenas are used
before "nearly empty" arenas. Right now, when a pool is free, it is
pushed on the list. When one is needed, it is popped off. This leads to
an LRU allocation of memory. What I am doing is removing all the free
pools from the list, and putting them back on so that areas that have
more free pools are used later, while arenas with less free pools are
In my crude tests, the second detail increases the number of completely
free arenas. However, I suspect that differentiating between free
arenas and used arenas, like is already done for pools, would be a good
> In apps with steady states, between steady-state transitions it's not
> a good idea to "artificially" collapse free pools into free pools of
> larger size, because the app is going to want to reuse pools of the
> specific sizes it frees, and obmalloc optimizes for that case.
Absolutely: I am not touching that. I'm working from the assumption
that pymalloc has been well tested and well tuned and is appropriate
for Python workloads. I'm just trying to make it free memory
> If the real point of this (whatever it is <wink>) is to identify free
> arenas, I expect that could be done a lot easier by keeping a count of
> allocated pools in each arena; e.g., maybe at the start of the arena,
> or by augmenting the vector of arena base addresses.
You are correct, and this is something I would like to play with. This
is, of course, a tradeoff between overhead on each allocation and
deallocation, and one big occasionally overhead caused by the "cleanup"
process. I'm going to try and take a look at this tonight, if I get
some real work done this afternoon.
> But in some versions of reality, that isn't true. The best available
> explanation is in new_arena()'s long internal comment block: because
> of historical confusions about what Python's memory API *is*, it's
> possible that extension modules outside the core are incorrectly
> calling the obmalloc free() when they should be calling the system
> free(), and doing so without holding the GIL.
Let me just make sure I am clear on this: Some extensions use native
threads, is that why this is a problem? Because as far as I am aware,
the Python interpreter itself is not threaded. So how does the cyclical
garbage collector work? Doesn't it require that there is no execution
> Now all such insane uses have been officially deprecated, so you could
> be bold and just assume obmalloc is always entered by a thread holding
> the GIL now.
I would rather not break this property of obmalloc. However, this leads
to a big problem: I'm not sure it is possible to have an occasional
cleanup task be lockless and co-operate nicely with other threads,
since by definition it needs to go and mess with all the arenas. One of
the reason that obmalloc *doesn't* have this problem is because it
never releases memory.
> It's only a waste if it ultimately fails <wink>.
It is also a waste if the core Python developers decide it is a bad
idea, and don't want to accept patches! :)
Thanks for your feedback,
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso
More information about the Python-Dev