Why no documentation of basic data structure performance?
David Eppstein
eppstein at ics.uci.edu
Tue Feb 12 16:30:08 EST 2002
Does anyone else notice a serious lack of timing information in the Python
documentation?
For example, the Python tutorial goes through an example of using a list as
a stack (5.1.1), with append() and pop(), and as a queue (5.1.2) with
append() and pop(0). It does not mention the significant differences in
timing between these two pieces of code -- the queue operations take time
linear in the length of the list while the stack operations take constant
amortized time (theory), and it is easy to come up with cases where this
makes a significant difference in actual time (practice).
We had a discussion a couple weeks ago on this board about how to implement
queues efficiently, but I'm more concerned here with the state of the
documentation: why doesn't it give some indication that 5.1.1 is fast and
that 5.1.2 is slow? If a programmer needs to add and remove things from a
list of items, and it doesn't matter for correctness whether he uses stack
or queue order, how is he supposed to know that the stack is a better
choice?
Even when I knew about this issue and wanted to know whether append() was
really constant-amortized-time, I couldn't find any mention in any of the
Python documention of the performance of these basic list operations,
instead I had to grope the source, find listobject.c, and see the comment
near the start of that file:
* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends()
If it was worth putting in the source, why isn't it worth putting in the
documentation? I have the impression that one of the biggest negative
points for Python wrt other languages is the perception that it's slow, but
that a lot of that perception may come from programmers not having a good
model of how much time each Python operation takes -- if they don't know
which operations to avoid, they can't design their code to avoid them.
(Not that Python is alone in this lack -- can anyone tell me whether the
Java Vector class uses a constant-amortized-time resizing strategy when not
given a capacityIncrement initialization parameter?)
--
David Eppstein UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/
More information about the Python-list
mailing list