First, let me thank you for this very detailed reply. It really helped me understand a lot more about what is going on inside the Python interpreter. On Oct 19, 2004, at 16:53, Tim Peters wrote:
It's stack-like: it reuses the pool most recently emptied, because the expectation is that the most recently emptied pool is the most likely of all empty pools to be highest in the memory hierarchy. I really don't know what LRU (or MRU) might mean in this context (it's not like we've evicting something from a cache).
Err... Right: MRU. It uses the most recently used free block. This is totally a cache: It's a cache of free memory pages.
Harder than it looked, eh <wink>?
Actually, much. I spent about 6 hours figuring out what was going on. At this point, I think I have enough of a handle on the situation that I might as well go about trying to improve it.
Or it may be small overhead, if all it's trying to do is free() empty arenas. Indeed, if arenas "grow states" too, *arena* transitions should be so rare that perhaps they could afford to do extra processing right then to decide whether to free() an arena that just transitioned to its notion of an empty state.
That is true. However, I don't think freeing arenas immediately is the best plan, as we don't really want to do that if the application is cyclical in its memory consumption (ie. it creates a ton of objects, then releases them, then does it again). I still think that some sort of periodic collection is best, as it will help Python adjust to applications with a wide variety of memory profiles.
If we changed PyMem_{Free, FREE, Del, DEL} to map to the system free(), all would be golden (except for broken old code mixing PyObject_ with PyMem_ calls). If any such broken code still exists, that remapping would lead to dramatic failures, easy to reproduce; and old code broken in the other, infinitely more subtle way (calling PyMem_{Free, FREE, Del, DEL} when not holding the GIL) would continue to work fine.
Hmm... This seems like a logical approach to me. It certainly gives me a lot more freedom in reworking the memory allocator. Are there any objections to this idea?
Any number of threads can be running Python code in a single process, although the GIL serializes their execution *while* they're executing Python code. When a thread ends up in C code, it's up to the C code to decide whether to release the GIL and so allow other threads to run at the same time. If it does, that thread must reacquire the GIL before making another Python C API call (with very few exceptions, related to Python C API thread initialization and teardown functions).
Ah, now I understand! Creating a Python thread actually creates a native thread then, it's just that because of the GIL they run sequentially when executing Python code. This is an interesting approach! For some reason I was under the impression that the Python interpreter used user level threads to implement Python threads.
obmalloc doesn't have *that* problem, though -- nothing obmalloc does can cause Python code to get executed, so obmalloc can assume that the thread calling into it holds the GIL for as long as obmalloc wants. Except, again, for the crazy PyMem_{Free, FREE, Del, DEL} exception.
Terrific. This makes life much, much easier.
I would -- it's backward compatibility hacks for insane code, which may not even exist anymore, and you'll find that it puts severe contraints on what you can do.
Again, does anyone object to this point of view before I begin working from this assumption? This means that I can assume that only one thread will call code in obmalloc at a time. I can do the same thing that the current obmalloc implementation does: Add the macros for the locks, but have them resolve to nothing. Thanks for the tutorial in the Python interpreter internals, Evan Jones -- Evan Jones: http://evanjones.ca/ "Computers are useless. They can only give answers" - Pablo Picasso