Why no documentation of basic data structure performance?

David Eppstein eppstein at ics.uci.edu
Tue Feb 12 16:30:08 EST 2002


Does anyone else notice a serious lack of timing information in the Python 
documentation?

For example, the Python tutorial goes through an example of using a list as 
a stack (5.1.1), with append() and pop(), and as a queue (5.1.2) with 
append() and pop(0).  It does not mention the significant differences in 
timing between these two pieces of code -- the queue operations take time 
linear in the length of the list while the stack operations take constant 
amortized time (theory), and it is easy to come up with cases where this 
makes a significant difference in actual time (practice).

We had a discussion a couple weeks ago on this board about how to implement 
queues efficiently, but I'm more concerned here with the state of the 
documentation: why doesn't it give some indication that 5.1.1 is fast and 
that 5.1.2 is slow?  If a programmer needs to add and remove things from a 
list of items, and it doesn't matter for correctness whether he uses stack 
or queue order, how is he supposed to know that the stack is a better 
choice?

Even when I knew about this issue and wanted to know whether append() was 
really constant-amortized-time, I couldn't find any mention in any of the 
Python documention of the performance of these basic list operations, 
instead I had to grope the source, find listobject.c, and see the comment 
near the start of that file:
         * This over-allocates proportional to the list size, making room
         * for additional growth.  The over-allocation is mild, but is
         * enough to give linear-time amortized behavior over a long
         * sequence of appends()

If it was worth putting in the source, why isn't it worth putting in the 
documentation?  I have the impression that one of the biggest negative 
points for Python wrt other languages is the perception that it's slow, but 
that a lot of that perception may come from programmers not having a good 
model of how much time each Python operation takes -- if they don't know 
which operations to avoid, they can't design their code to avoid them.

(Not that Python is alone in this lack -- can anyone tell me whether the 
Java Vector class uses a constant-amortized-time resizing strategy when not 
given a capacityIncrement initialization parameter?)
-- 
David Eppstein       UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/



More information about the Python-list mailing list