I've just seen the Introducing Python video, found in
This is a very interesting video, at least after you stop laughing. :-))
Jokes apart, it's indeed interesting to know how your mailing list
partners/programming partners/benevolent dictators/friends/whatever
look like, when they're not ascii characters. I'd advice it to anyone
who is part of that community and is not able to be closer in meetings
and similar events.
Btw, Tim, your <wink>s will have a special meaning to me from
now on. ;-)
[ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
My nightly run of pybench went up from the usual 7590ms per
run to around 8200ms between Monday night and today. Can anyone
explain this ?
Professional Python Software directly from the Source (#1, Feb 27 2003)
>>> Python/Zope Products & Consulting ... http://www.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
Python UK 2003, Oxford: 33 days left
EuroPython 2003, Charleroi, Belgium: 117 days left
new-style classes keep track of their subclasses through weak-refs so they
remain in general collectible.
consider this code: (inspired by this post comp.lang.python post
def __init__(cls, name, bases, dict):
super(MyMetaclass, cls).__init__(name, bases, dict)
print 'initialized', cls.__name__
if 'meta__del__' in sys.argv:
print 'deleted', cls.__name__
if 'meta' in sys.argv:
__metaclass__ = MyMetaclass
gc.collect() # force involved weak-refs to be cleared
print "MyClass subclasses",MyClass.__subclasses__()
MyClass subclasses 
C:\exp\py-subclasses-gc>\transit\Py23\python test.py meta
MyClass subclasses 
also Sub with both metaclass type or MyMetaclass without __del__ is collectible
and collected but:
C:\exp\py-subclasses-gc>\transit\Py23\python test.py meta meta__del__
MyClass subclasses [<class '__main__.Sub'>]
garbage [<class '__main__.Sub'>]
if MyMetaclass grows a __del__ method Sub is no longer collectible ...
Ive done a static analysis of the bytecodes from compiling the python
Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
Some stats about JUMP_IF_FALSE opcodes
Of the 2768 JUMP_IF_FALSE opcodes encountered, 2429 have a POP_TOP on
Id like to propose that JUMP_IF_FALSE consume the top-of-stack.
Some stats about constants
50% of constant accesses are to a fixed set of 5 constants
rank, freq, cum%, const
1, 1277, 18.7, None
2, 929, 32.3, 1
3, 741, 43.1, 0
4, 254, 46.8, ''
5, 228, 50.1, 2
Id like to propose the following opcodes be added
Some stats about the number of constants and locals used in functions
97% of functions use 16 or less constants
83% of functions use 8 or less constants
98% of functions use 16 or less locals
85% of functions use 8 or less locals
Id like to propose the following opcodes be added (I suggest n=15)
Some stats about instruction traces
Please see the following links for detailed stats
The second file contains stats on instruction traces incorporating the
The score column, in both files, is computed by multiplying the
frequency by the length of the trace
Id like to propose the following opcodes, which should reduce number of
bytecode instructions used by 20%:
RETURN_FAST == LOAD_FAST, RETURN_VALUE
RETURN_CONST == LOAD_CONST, RETURN_VALUE
LOAD_FAST+1 == LOAD_FAST, LOAD_FAST
STORE_FAST+1 == STORE_FAST, STORE_FAST
POP_TOP+1 == POP_TOP, POP_TOP
POP_TOP+2 == POP_TOP, POP_TOP, POP_TOP
BRANCH_IF == COMPARE_OP, JUMP_IF_FALSE, POP_TOP
LOAD_FAST+1 and STORE_FAST+1 could be implemented as a 1 byte
instruction code followed by two nibbles encoding the local index
numbers. See above for a discussion of local variable index numbers.
BRANCH_IF could be implimented as a set of opcodes, one for each of the
I've decided to answer Guido's call for someone to take over
maintenance of the SRE code since it has started to fall into
disrepair. First a short introduction and then on with a question
that begs for some discussion on this list.
My name is Gary Herron. I've been using Python whenever possible for
about 8 years (and for most of the last year and a half I've been able
to choose Python almost exclusively -- lucky me). I've mostly lurked
around the python and python-dev lists, only occasionally offering
help or comments. Volunteering to maintain the SRE code seems like a
good opportunity to jump in and do something useful.
Now on with the questions at hand:
The first glance at the regular expression bug list and the _sre.c
code results in the observation that several of the bugs are related
to running over the recursion limit. The problem comes from using a
pattern containing ".*?" in a situation where it is expected to match
many thousands of characters. Each character matched by ".*?" causes
one level or recursion, quickly overflowing the recursion limit.
Question 1: Should we even consider these as bugs?
After all the recursion limit is in place to prevent badly used re's
from crashing Python with a stack overflow. We could claim the kinds
of patterns which cause heavy recursion are miss-uses of regular
expressions which are bound to fail when used on long strings. If
we take this route, something should be added to the documentation
which explains when excessive recursion is likely to bite.
Question 2: If we want to solve the problem (instead of just dodging
it) how should we proceed?
* Increasing the limit beyond the current 10000 is not really an
option for two reasons:
1. This doesn't solve the problem. One can always match on a
string purposely chosen to be long enough to overflow any
2. A recent patch (browse "cvs log _sre.c" to find a reference)
actually lowered the limit from 10000 to 7500 for certain
64-bit machines which apparently suffered a stack overflow
before hitting 10000 recursion levels.
* An attempt to replace the hard-coded upper limit with a programmed
check of the stack space (see Misc/HISTORY for a reference to
PyOS_CheckStack) was added and then withdrawn for version 2.0.
Does anybody know the history of this? This would not really solve
the problem (especially on the 64 bit machines which could not even
hit 10000 levels of recursion), but it would push the recursion
limit to its highest possible value rather than some arbitrary
* Removing the recursion by the standard method of storing state in a
program managed stack and looping rather than recursing would push
the storage problem from the stack into the (probably much larger)
heap. I haven't looked at the code enough to judge if this is
feasible, but if it is, some limit would still remain. It would,
however, depend on available memory rather than stack space. And
still, the documentation should warn that certain naive pattens on
LONG strings could fail after wasting much time chewing through all
* I notice that, unlike pattern ".*?", matching to pattern ".*" does
not recurse for each character matched. With only a few minutes of
looking at the code, I can't begin to guess if it is feasible to
make the former work like the later without recursing.
Any comments? Remember that all the points under question 2 are worth
considering only if we decide we really ought to support things like
patterns using ".*?" to match many thousands of characters.
In advance of asking Guido to review and pronounce on PEP 305 and its
related code, I'd like to ask you to take a few minutes to review what we've
produced. There is the PEP, of course:
but there is also source code, a large number of test cases and a libref
section available in the CVS sandbox. Cliff Wells is working on a csvutils
module which will contain adaptations of the "sniffing" routines from his
Just do a "csv up -dP ." in your nondist/sandbox directory to get the latest
version of everything. Feel free to review and/or comment on any or all of
it, but please please post your comments to the csv(a)mail.mojam.com mailing
list. You can review our rather active correspondence at
or if you're really excited about CSV files, you can subscribe at
Jeremy Hylton wrote:
> If you are benchmarking various opcode effects, I'd recommend trying to
> revive the simple cycle counter instrumentation I did for Python 2.2. The
> idea is to use the Pentium cycle counter to measure the number of cycles
> spent on each trip through the mainloop.
For Linux >= 2.4 and an x86 CPU, oprofile will tell you (stochastically) how
many CPU cycles are spent on each x86 instruction.
> > > Speaking entirely from a point of ignorance, why are the source line #s
> > > not shown for frames that are implemented in modules loaded from
> > > zipimport?
> > Because the code printing the tracebacks doesn't know how to look
> > inside a zip file.
> Maybe, if the source file can't be found, it could
> decompile the bytecode?
Too clever by far. The peculiar way in which the comments disappear,
the fact that the code is wrong when I used a (so-far-non-existent)
peephole optimizer to optimize my .pyc files... I'd rather show NO
line (so long as we still give file and line number) than try to
guess in an overly clever manner.
-- Michael Chermside