Evan Jones writes:
> My knowledge about garbage collection is weak, but I have read a little
> bit of Hans Boehm's work on garbage collection. [...] The biggest
> disadvantage mentioned is that simple pointer assignments end up
> becoming "increment ref count" operations as well...
Hans Boehm certainly has some excellent points. I believe a little
searching through the Python dev archives will reveal that attempts
have been made in the past to use his GC tools with CPython, and that
the results have been disapointing. That may be because other parts
of CPython are optimized for reference counting, or it may be just
because this stuff is so bloody difficult!
However, remember that changing away from reference counting is a change
to the semantics of CPython. Right now, people can (and often do) assume
that objects which don't participate in a reference loop are collected
as soon as they go out of scope. They write code that depends on
this... idioms like:
>>> text_of_file = open(file_name, 'r').read()
Perhaps such idioms aren't a good practice (they'd fail in Jython or
in IronPython), but they ARE common. So we shouldn't stop using
reference counting unless we can demonstrate that the alternative is
clearly better. Of course, we'd also need to devise a way for extensions
to cooperate (which is a problem Jython, at least, doesn't face).
So it's NOT an obvious call, and so far numerous attempts to review
other GC strategies have failed. I wouldn't be so quick to dismiss
> My only argument for making Python capable of leveraging multiple
> processor environments is that multithreading seems to be where the big
> performance increases will be in the next few years. I am currently
> using Python for some relatively large simulations, so performance is
> important to me.
CPython CAN leverage such environments, and it IS used that way.
However, this requires using multiple Python processes and inter-process
communication of some sort (there are lots of choices, take your pick).
It's a technique which is more trouble for the programmer, but in my
experience usually has less likelihood of containing subtle parallel
processing bugs. Sure, it'd be great if Python threads could make use
of separate CPUs, but if the cost of that were that Python dictionaries
performed as poorly as a Java HashTable or synchronized HashMap, then it
wouldn't be worth the cost. There's a reason why Java moved away from
HashTable (the threadsafe data structure) to HashMap (not threadsafe).
Perhaps the REAL solution is just a really good IPC library that makes
it easier to write programs that launch "threads" as separate processes
and communicate with them. No change to the internals, just a new
library to encourage people to use the technique that already works.
-- Michael Chermside
> > In theory, I don't see how you could improve on METH_O and METH_NOARGS.
> > The only saving is the time for the flag test (a predictable branch).
> > Offsetting that savings is the additional time for checking min/max args
> > and for constructing a C call with the appropriate number of args. I
> > suspect there is no savings here and that the timings will get worse.
> I think tested a method I changed from METH_O to METH_ARGS and could
> not measure a difference.
Something is probably wrong with the measurements. The new call does much more work than METH_O or METH_NOARGS. Those two common and essential cases cannot be faster and are likely slower on at least some compilers and some machines. If some timing shows differently, then it is likely a mirage (falling into an unsustainable local minimum).
The patch introduces range checks, an extra C function call, nine variable initializations, and two additional unpredictable branches (the case statements). The only benefit (in terms of timing) is possibly saving a tuple allocation/deallocation. That benefit only kicks in for METH_VARARGS and even then only when the tuple free list is empty.
I recommend not changing ANY of the METH_O and METH_NOARGS calls. These are already close to optimal.
> A beneift would be to consolidate METH_O,
> METH_NOARGS, and METH_VARARGS into a single case. This should
> make code simpler all around (IMO).
Will backwards compatibility allow those cases to be eliminated? It would be a bummer if most existing extensions could not compile with Py2.5. Also, METH_VARARGS will likely have to hang around unless a way can be found to handle more than nine arguments.
This patch appears to be taking on a life of its own and is being applied more broadly than is necessary or wise. The patch is extensive and introduces a new C API that cannot be taken back later, so we ought to be careful with it.
For the time being, try not to touch the existing METH_O and METH_NOARGS methods. Focus on situations that do stand a chance of being improved (such as methods with a signature like "O|O").
That being said, I really like the concept. I just worry that many of the stated benefits won't materialize:
* having to keep the old versions for backwards compatibility,
* being slower than METH_O and METH_NOARGS,
* not handling more than nine arguments,
* separating function signature info from the function itself,
* the time to initialize all the argument variables to NULL,
* somewhat unattractive case stmt code for building the c function call.
The Python 2.4 Lib/bsddb/__init__.py contains this:
# for backwards compatibility with python versions older than 2.3, the
# iterator interface is dynamically defined and added using a mixin
# class. old python can't tokenize it due to the yield keyword.
if sys.version >= '2.3':
from weakref import ref
Because the imports are inside an exec, modulefinder (e.g. when using bsddb
with a py2exe built application) does not realise that the imports are
required. (The requirement can be manually specified, of course, if you
know that you need to do so).
I believe that changing the above code to:
if sys.version >= '2.3':
from weakref import ref
Would still have the intended effect and would let modulefinder do its work.
The main question (to steal Thomas's words) is whether the library modules
should be written to help the freeze tools - if the answer is 'yes', then
I'll submit the above as a patch for 2.5.
> The main question (to steal Thomas's words) is whether the
> library modules should be written to help the freeze tools
> - if the answer is 'yes', then I'll submit the above as a
> patch for 2.5.
[Martin v. Löwis]
> The answer to this question certainly is "yes, if possible". In this
> specific case, I wonder whether the backwards compatibility is still
> required in the first place. According to PEP 291, Greg Smith and
> Barry Warsaw decide on this, so I think they would need to comment
> first because any patch can be integrated.
Thanks! I've gone ahead and submitted a patch, in that case:
[ 1112812 ] Patch for Lib/bsddb/__init__.py to work with modulefinder
I realise that neither of the people that need to look at this are part of
the '5 for 1' deal, so I need to wait for one of them to have time to look
at it (plenty of time left before 2.5 anyway) but I'll do 5 reviews for the
karma anyway, today or tomorrow.
This message is a follow up to a thread I started on python-dev back in
October, archived here:
Basically, the problem I am trying to solve is that the Python memory
allocator never frees memory back to the operating system. I have
attached a patch against obmalloc.c for discussion. The patch still has
some rough edges and possibly some bugs, so I don't think it should be
merged as is. However, I would appreciate any feedback on the chances
for getting this implementation into the core. The rest of this message
lists some disadvantages to this implementation, a description of the
important changes, a benchmark, and my future plans if this change gets
The patch works for any version of Python that uses obmalloc.c (which
includes Python 2.3 and 2.4), but I did my testing with Python 2.5 from
CVS under Linux and Mac OS X. This version of the allocator will
actually free memory. It has two disadvantages:
First, there is slightly more overhead with programs that allocate a
lot of memory, release it, then reallocate it. The original allocator
simply holds on to all the memory, allowing it to be efficiently
reused. This allocator will call free(), so it also must call malloc()
again when the memory is needed. I have a "worst case" benchmark which
shows that this cost isn't too significant, but it could be a problem
for some workloads. If it is, I have an idea for how to work around it.
Second, the previous allocator went out of its way to permit a module
to call PyObject_Free while another thread is executing
PyObject_Malloc. Apparently, this was a backwards compatibility hack
for old Python modules which erroneously call these functions without
holding the GIL. These modules will have to be fixed if this
implementation is accepted into the core.
Summary of the changes:
- Add an "arena_object" structure for tracking pages that belong to
each 256kB arena.
- Change the "arenas" array from an array of pointers to an array of
- When freeing a page (a pool), it is placed on a free pool list for
the arena it belongs to, instead of a global free pool list.
- When freeing a page, if the arena is completely unused, the arena is
- When allocating a page, it is taken from the arena that is the most
full. This gives arenas that are almost completely unused a chance to
The only benchmark I have performed at the moment is the worst case for
this allocator: A program that allocates 1 000 000 Python objects which
occupy nearly 200MB, frees them, reallocates them, then quits. I ran
the program four times, and discarded the initial time. Here is the
def __init__( self ):
self.dumb = "hello"
And here are the average execution times for this program:
real time: 16.304
user time: 16.016
Python 2.5 + patch:
real time: 16.062
user time: 15.593
As expected, the patched version spends nearly twice as much system
time than the original version. This is because it calls free() and
malloc() twice as many times. However, this difference is offset by the
fact that the user space execution time is actually *less* than the
original version. How is this possible? The likely cause is because the
original version defined the arenas pointer to be "volatile" in order
to work when Free and Malloc were called simultaneously. Since this
version breaks that, the pointer no longer needs to be volatile, which
allows the value to be stored in a register instead of being read from
memory on each operation.
Here are some graphs of the memory allocator behaviour running this
- More detailed benchmarking.
- The "specialized" allocators for the basic types, such as ints, also
need to free memory back to the system.
- Potentially the allocator should keep some amount of free memory
around to improve the performance of programs that cyclically allocate
and free large amounts of memory. This amount should be "self-tuned" to
Thank you for your feedback,
Due to the issue of thread safety in the Python memory allocator, I
have been wondering about thread safety in the rest of the Python
interpreter. I understand that the interpreter is not thread safe, but
I'm not sure that I have seen a discussion of the all areas where this
is an issue. Here are the areas I know of:
1. The memory allocator.
2. Reference counts.
3. The cyclic garbage collector.
4. Current interpreter state is pointed to by a single shared pointer.
5. Many modules may not be thread safe (?).
Ignoring the issue of #5 for the moment, are there any other areas
where this is a problem? I'm curious about how much work it would be to
allow concurrent execution of Python code.
Note: One of the reasons I am asking is that my memory allocator patch
is that it changes the current allocator from "sort of" thread safe to
obviously unsafe. One way to eliminate this issue is to make the
allocator completely thread safe, but that would require some fairly
significant changes to avoid a major performance penalty. However, if
it was one of the components that permitted the interpreter to go
multi-threaded, then it would be worth it.