Nice and short summary this time. Plan to send this off Wednesday or Thursday
so get corrections in before then.
You can still `register <http://www.python.org/pycon/2005/register.html>`__ for
`PyCon`_. The `schedule of talks`_ is now online. Jim Hugunin is lined up to
be the keynote speaker on the first day with Guido being the keynote on
Thursday. Once again PyCon looks like it is going to be great.
On a different note, as I am sure you are all aware I am still about a month
behind in summaries. School this quarter for me has just turned out hectic. I
think it is lack of motivation thanks to having finished my 14 doctoral
applications just a little over a week ago (and no, that number is not a typo).
I am going to for the first time in my life come up with a very regimented
study schedule that will hopefully allow me to fit in weekly Python time so as
to allow me to catch up on summaries.
And this summary is not short because I wanted to finish it. 2.5 was released
just before the time this summary covers so most stuff was on bug fixes
discovered after the release.
.. _PyCon: http://www.pycon.org/
.. _schedule of talks: http://www.python.org/pycon/2005/schedule.html
I introduced a `proto-PEP
the list on how one can go about changing CPython's bytecode. It will need
rewriting once the AST branch is merged into HEAD on CVS. Plus I need to get a
PEP number assigned to me. =)
- ` proto-pep: How to change Python's bytecode <>`__
Handling versioning within a package
The suggestion of extending import syntax to support explicit version
importation came up. The idea was to have something along the lines of
``import foo version 2, 4`` so that one can have packages that contain
different versions of itself and to provide an easy way to specify which
version was desired.
The idea didn't fly, though. The main objection was that import-as support was
all you really needed; ``import foo_2_4 as foo``. And if you had a ton of
references to a specific package and didn't want to burden yourself with
explicit imports, one can always have a single place before codes starts
executing doing ``import foo_2_4; sys.modules["foo"] = foo_2_4``. And that
itself can even be lower by creating a foo.py file that does the above for you.
You can also look at how wxPython handles it at
- `Re: [Pythonmac-SIG] The versioning question... <>`__
- Problems compiling Python 2.3.3 on Solaris 10 with gcc 3.4.1
- 2.4 news reaches interesting places
see `last summary`_ for coverage of this thread
- RE: [Python-checkins] python/dist/src/Modules posixmodule.c, 2.300.8.10,
- mmap feature or bug?
- Re: [Python-checkins] python/dist/src/Pythonmarshal.c, 1.79, 1.80
- Latex problem when trying to build documentation
- Patches: 1 for the price of 10.
- Python for Series 60 released
- Website documentation - link to descriptor information
- Build extensions for windows python 2.4 what are the compiler rules?
- Re: [Python-checkins] python/dist/src setup.py, 1.208, 1.209
- Zipfile needs?
fake 32-bit unsigned int overflow with ``x = x & 0xFFFFFFFFL`` and signed
ints with the additional ``if x & 0x80000000L: x -= 0x100000000L`` .
- Re: [Python-checkins] python/dist/src/Mac/OSX fixapplepython23.py, 1.1, 1.2
Evan Jones writes:
> My knowledge about garbage collection is weak, but I have read a little
> bit of Hans Boehm's work on garbage collection. [...] The biggest
> disadvantage mentioned is that simple pointer assignments end up
> becoming "increment ref count" operations as well...
Hans Boehm certainly has some excellent points. I believe a little
searching through the Python dev archives will reveal that attempts
have been made in the past to use his GC tools with CPython, and that
the results have been disapointing. That may be because other parts
of CPython are optimized for reference counting, or it may be just
because this stuff is so bloody difficult!
However, remember that changing away from reference counting is a change
to the semantics of CPython. Right now, people can (and often do) assume
that objects which don't participate in a reference loop are collected
as soon as they go out of scope. They write code that depends on
this... idioms like:
>>> text_of_file = open(file_name, 'r').read()
Perhaps such idioms aren't a good practice (they'd fail in Jython or
in IronPython), but they ARE common. So we shouldn't stop using
reference counting unless we can demonstrate that the alternative is
clearly better. Of course, we'd also need to devise a way for extensions
to cooperate (which is a problem Jython, at least, doesn't face).
So it's NOT an obvious call, and so far numerous attempts to review
other GC strategies have failed. I wouldn't be so quick to dismiss
> My only argument for making Python capable of leveraging multiple
> processor environments is that multithreading seems to be where the big
> performance increases will be in the next few years. I am currently
> using Python for some relatively large simulations, so performance is
> important to me.
CPython CAN leverage such environments, and it IS used that way.
However, this requires using multiple Python processes and inter-process
communication of some sort (there are lots of choices, take your pick).
It's a technique which is more trouble for the programmer, but in my
experience usually has less likelihood of containing subtle parallel
processing bugs. Sure, it'd be great if Python threads could make use
of separate CPUs, but if the cost of that were that Python dictionaries
performed as poorly as a Java HashTable or synchronized HashMap, then it
wouldn't be worth the cost. There's a reason why Java moved away from
HashTable (the threadsafe data structure) to HashMap (not threadsafe).
Perhaps the REAL solution is just a really good IPC library that makes
it easier to write programs that launch "threads" as separate processes
and communicate with them. No change to the internals, just a new
library to encourage people to use the technique that already works.
-- Michael Chermside
> > In theory, I don't see how you could improve on METH_O and METH_NOARGS.
> > The only saving is the time for the flag test (a predictable branch).
> > Offsetting that savings is the additional time for checking min/max args
> > and for constructing a C call with the appropriate number of args. I
> > suspect there is no savings here and that the timings will get worse.
> I think tested a method I changed from METH_O to METH_ARGS and could
> not measure a difference.
Something is probably wrong with the measurements. The new call does much more work than METH_O or METH_NOARGS. Those two common and essential cases cannot be faster and are likely slower on at least some compilers and some machines. If some timing shows differently, then it is likely a mirage (falling into an unsustainable local minimum).
The patch introduces range checks, an extra C function call, nine variable initializations, and two additional unpredictable branches (the case statements). The only benefit (in terms of timing) is possibly saving a tuple allocation/deallocation. That benefit only kicks in for METH_VARARGS and even then only when the tuple free list is empty.
I recommend not changing ANY of the METH_O and METH_NOARGS calls. These are already close to optimal.
> A beneift would be to consolidate METH_O,
> METH_NOARGS, and METH_VARARGS into a single case. This should
> make code simpler all around (IMO).
Will backwards compatibility allow those cases to be eliminated? It would be a bummer if most existing extensions could not compile with Py2.5. Also, METH_VARARGS will likely have to hang around unless a way can be found to handle more than nine arguments.
This patch appears to be taking on a life of its own and is being applied more broadly than is necessary or wise. The patch is extensive and introduces a new C API that cannot be taken back later, so we ought to be careful with it.
For the time being, try not to touch the existing METH_O and METH_NOARGS methods. Focus on situations that do stand a chance of being improved (such as methods with a signature like "O|O").
That being said, I really like the concept. I just worry that many of the stated benefits won't materialize:
* having to keep the old versions for backwards compatibility,
* being slower than METH_O and METH_NOARGS,
* not handling more than nine arguments,
* separating function signature info from the function itself,
* the time to initialize all the argument variables to NULL,
* somewhat unattractive case stmt code for building the c function call.
The Python 2.4 Lib/bsddb/__init__.py contains this:
# for backwards compatibility with python versions older than 2.3, the
# iterator interface is dynamically defined and added using a mixin
# class. old python can't tokenize it due to the yield keyword.
if sys.version >= '2.3':
from weakref import ref
Because the imports are inside an exec, modulefinder (e.g. when using bsddb
with a py2exe built application) does not realise that the imports are
required. (The requirement can be manually specified, of course, if you
know that you need to do so).
I believe that changing the above code to:
if sys.version >= '2.3':
from weakref import ref
Would still have the intended effect and would let modulefinder do its work.
The main question (to steal Thomas's words) is whether the library modules
should be written to help the freeze tools - if the answer is 'yes', then
I'll submit the above as a patch for 2.5.
> The main question (to steal Thomas's words) is whether the
> library modules should be written to help the freeze tools
> - if the answer is 'yes', then I'll submit the above as a
> patch for 2.5.
[Martin v. Löwis]
> The answer to this question certainly is "yes, if possible". In this
> specific case, I wonder whether the backwards compatibility is still
> required in the first place. According to PEP 291, Greg Smith and
> Barry Warsaw decide on this, so I think they would need to comment
> first because any patch can be integrated.
Thanks! I've gone ahead and submitted a patch, in that case:
[ 1112812 ] Patch for Lib/bsddb/__init__.py to work with modulefinder
I realise that neither of the people that need to look at this are part of
the '5 for 1' deal, so I need to wait for one of them to have time to look
at it (plenty of time left before 2.5 anyway) but I'll do 5 reviews for the
karma anyway, today or tomorrow.
This message is a follow up to a thread I started on python-dev back in
October, archived here:
Basically, the problem I am trying to solve is that the Python memory
allocator never frees memory back to the operating system. I have
attached a patch against obmalloc.c for discussion. The patch still has
some rough edges and possibly some bugs, so I don't think it should be
merged as is. However, I would appreciate any feedback on the chances
for getting this implementation into the core. The rest of this message
lists some disadvantages to this implementation, a description of the
important changes, a benchmark, and my future plans if this change gets
The patch works for any version of Python that uses obmalloc.c (which
includes Python 2.3 and 2.4), but I did my testing with Python 2.5 from
CVS under Linux and Mac OS X. This version of the allocator will
actually free memory. It has two disadvantages:
First, there is slightly more overhead with programs that allocate a
lot of memory, release it, then reallocate it. The original allocator
simply holds on to all the memory, allowing it to be efficiently
reused. This allocator will call free(), so it also must call malloc()
again when the memory is needed. I have a "worst case" benchmark which
shows that this cost isn't too significant, but it could be a problem
for some workloads. If it is, I have an idea for how to work around it.
Second, the previous allocator went out of its way to permit a module
to call PyObject_Free while another thread is executing
PyObject_Malloc. Apparently, this was a backwards compatibility hack
for old Python modules which erroneously call these functions without
holding the GIL. These modules will have to be fixed if this
implementation is accepted into the core.
Summary of the changes:
- Add an "arena_object" structure for tracking pages that belong to
each 256kB arena.
- Change the "arenas" array from an array of pointers to an array of
- When freeing a page (a pool), it is placed on a free pool list for
the arena it belongs to, instead of a global free pool list.
- When freeing a page, if the arena is completely unused, the arena is
- When allocating a page, it is taken from the arena that is the most
full. This gives arenas that are almost completely unused a chance to
The only benchmark I have performed at the moment is the worst case for
this allocator: A program that allocates 1 000 000 Python objects which
occupy nearly 200MB, frees them, reallocates them, then quits. I ran
the program four times, and discarded the initial time. Here is the
def __init__( self ):
self.dumb = "hello"
And here are the average execution times for this program:
real time: 16.304
user time: 16.016
Python 2.5 + patch:
real time: 16.062
user time: 15.593
As expected, the patched version spends nearly twice as much system
time than the original version. This is because it calls free() and
malloc() twice as many times. However, this difference is offset by the
fact that the user space execution time is actually *less* than the
original version. How is this possible? The likely cause is because the
original version defined the arenas pointer to be "volatile" in order
to work when Free and Malloc were called simultaneously. Since this
version breaks that, the pointer no longer needs to be volatile, which
allows the value to be stored in a register instead of being read from
memory on each operation.
Here are some graphs of the memory allocator behaviour running this
- More detailed benchmarking.
- The "specialized" allocators for the basic types, such as ints, also
need to free memory back to the system.
- Potentially the allocator should keep some amount of free memory
around to improve the performance of programs that cyclically allocate
and free large amounts of memory. This amount should be "self-tuned" to
Thank you for your feedback,