[Python-Dev] Re: 2.4 news reaches interesting places

Raymond Hettinger python at rcn.com
Fri Dec 10 23:42:29 CET 2004

> The Wiki entry seems to reinforce the impression that bugged Guido to
> begin with.  It provides a bunch of "but ..." explanations about why
> Python's speed isn't that important.  Python is slow, but "speed of
> development is far more important."

I felt the same way when reading it.  Also, it seemed to embody the
political outlook that optimization is sinful.  The document could be
much more positive, fact based, and informative.  Also, the wording
seems somewhat outdated.

A draft for a new entry is included below.  Review and feedback are


Techniques for Improving Performance and Scalability

Here are some coding guidelines for applications that demand peek
performance (in terms of memory utilization, speed, or scalability).

Use the best algorithms and fastest tools

.	Membership testing with sets and dictionaries is much faster,
O(1), than searching sequences, O(n).  When testing "a in b", b should
be a set or dictionary instead of a list or tuple.

.	String concatenation is best done with ''.join(seq) which is an
O(n) process.  In contrast, using the '+' or '+=' operators can result
in an O(n**2) process because new strings may be built for each
intermediate step.  Py2.4 mitigates this issue somewhat; however,
''.join(seq) remains the best practice.

.	Many tools come in both list form and iterator form (map and
itertools.imap, list comprehension and generator expressions, dict.items
and dict.iteritems).  In general, the iterator forms are more memory
friendly and more scalable.  They are preferred whenever a real list is
not required.

.	Many core building blocks are coded in optimized C.
Applications that take advantage of them can make substantial
performance gains.  The building blocks include all of the builtin
datatypes (lists, tuples, sets, and dictionaries) and extension modules
like array, itertools, and collections.deque.

.	Likewise, the builtin functions run faster than hand-built
equivalents.  For example, map(operator.add, v1, v2) is faster than
map(lambda x,y: x+y, v1, v2).

.	List comprehensions run a bit faster than equivalent for-loops.

.	Lists perform well as fixed length arrays and as stacks.  For
queue applications using pop(0) or insert(0,v)), collections.deque()
offers superior O(1) performance because it avoids rebuilding a full
list for each insertion or deletion, an O(n) process.

.	Custom sort ordering is best performed with Py2.4's key= option
or with the traditional decorate-sort-undecorate technique.  Both
approaches call the key function just once per element.  In contrast,
sort's cmp= option is called many times per element during a sort.  For
example, sort(key=str.lower) is faster than sort(cmp=lambda a,b:
cmp(a.lower(), b.lower())).

Take advantage of interpreter optimizations

.	In functions, local variables are accessed more quickly than
global variables, builtins, and attribute lookups.  So, it is sometimes
worth localizing variable access in inner-loops.  For example, the code
for random.shuffle() localizes access with the line, random=self.random.
That saves the shuffling loop from having to repeatedly lookup
self.random.  Outside of loops, the gain is minimal and rarely worth it.

.	The previous recommendation is a generalization of the rule to
factor constant expressions out of loops.  Also, constant folding should
be done manually.  Inside loops, write "x=3" instead of "x=1+2".

.	Function call overhead is large compared to other instructions.
Accordingly, it is sometimes worth in-lining code in the middle of a few
time-critical loops.

.	Starting with Py2.3, the interpreter optimizes "while 1" to just
a single jump.  In contrast "while True" takes several more steps.
While the latter is preferred for clarity, time critical code should use
the first form.

.	Multiple assignment is slower than individual assignment.  For
example "x,y=a,b" is slower than "x=a; y=b".  However, multiple
assignment is faster for variable swaps.  For example, "x,y=y,x" is
faster than "t=x; x=y; y=t".

.	Chained comparisons are faster than using the "and" operator.
Write "x < y < z" instead of "x < y and y < z".

.	A few fast approaches should be considered hacks and reserved
for only the most demanding applications.  For example, "x and True or
False" is faster than "bool(x)".

Take advantage of diagnostic tools

.	The hotshot and profile modules help identify performance

.	The timeit module offers immediate performance comparisons
between alternative approaches to writing individual lines of code.

Performance can dictate overall strategy

.	SAX is typically faster and more memory efficient than DOM
approaches to XML.
.	Threading can improve the response time in applications that
would otherwise waste time waiting for I/O.
.	Select can help minimize the overhead for polling multiple

More information about the Python-Dev mailing list