[Python-Dev] Changing pymalloc behaviour for long running processes

Evan Jones ejones at uwaterloo.ca
Tue Oct 19 23:47:21 CEST 2004


First, let me thank you for this very detailed reply. It really helped 
me understand a lot more about what is going on inside the Python 
interpreter.

On Oct 19, 2004, at 16:53, Tim Peters wrote:
> It's stack-like:  it reuses the pool most recently emptied, because
> the expectation is that the most recently emptied pool is the most
> likely of all empty pools to be highest in the memory hierarchy.  I
> really don't know what LRU (or MRU) might mean in this context (it's
> not like we've evicting something from a cache).

Err... Right: MRU. It uses the most recently used free block. This is 
totally a cache: It's a cache of free memory pages.

> Harder than it looked, eh <wink>?

Actually, much. I spent about 6 hours figuring out what was going on. 
At this point, I think I have enough of a handle on the situation that 
I might as well go about trying to improve it.

> Or it may be small overhead, if all it's trying to do is free() empty
> arenas.  Indeed, if arenas "grow states" too, *arena* transitions
> should be so rare that perhaps they could afford to do extra
> processing right then to decide whether to free() an arena that just
> transitioned to its notion of an empty state.

That is true. However, I don't think freeing arenas immediately is the 
best plan, as we don't really want to do that if the application is 
cyclical in its memory consumption (ie. it creates a ton of objects, 
then releases them, then does it again). I still think that some sort 
of periodic collection is best, as it will help Python adjust to 
applications with a wide variety of memory profiles.

> If we changed PyMem_{Free, FREE, Del, DEL} to map to the system
> free(), all would be golden (except for broken old code mixing
> PyObject_ with PyMem_ calls).  If any such broken code still exists,
> that remapping would lead to dramatic failures, easy to reproduce; and
> old code broken in the other, infinitely more subtle way (calling
> PyMem_{Free, FREE, Del, DEL} when not holding the GIL) would continue
> to work fine.

Hmm... This seems like a logical approach to me. It certainly gives me 
a lot more freedom in reworking the memory allocator. Are there any 
objections to this idea?

> Any number of threads can be running
> Python code in a single process, although the GIL serializes their
> execution *while* they're executing Python code.  When a thread ends
> up in C code, it's up to the C code to decide whether to release the
> GIL and so allow other threads to run at the same time.  If it does,
> that thread must reacquire the GIL before making another Python C API
> call (with very few exceptions, related to Python C API thread
> initialization and teardown functions).

Ah, now I understand! Creating a Python thread actually creates a 
native thread then, it's just that because of the GIL they run 
sequentially when executing Python code. This is an interesting 
approach! For some reason I was under the impression that the Python 
interpreter used user level threads to implement Python threads.

> obmalloc doesn't have *that* problem, though -- nothing obmalloc does
> can cause Python code to get executed, so obmalloc can assume that the
> thread calling into it holds the GIL for as long as obmalloc wants.
> Except, again, for the crazy PyMem_{Free, FREE, Del, DEL} exception.

Terrific. This makes life much, much easier.

> I would -- it's backward compatibility hacks for insane code, which
> may not even exist anymore, and you'll find that it puts severe
> contraints on what you can do.

Again, does anyone object to this point of view before I begin working 
from this assumption? This means that I can assume that only one thread 
will call code in obmalloc at a time. I can do the same thing that the 
current obmalloc implementation does: Add the macros for the locks, but 
have them resolve to nothing.

Thanks for the tutorial in the Python interpreter internals,

Evan Jones

--
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso



More information about the Python-Dev mailing list