
I think I have identified at least one performance bottleneck with Mailman. Hopefully, /the/ only bottleneck :) I think the culprit is in HyperDatabase.py, namely the DumbBTree class. This stuff is the interface b/w Mailman and Pipermail, and as such I am really quite unfamilar with this code, but using the trick I outlined in a previous message, I found that most of the time when I printed the stack trace, I found myself in HyperDatabase.clearIndex().
I think the algorithm of using key=dumbtree.next(); del dumbtree[key] is extremely inefficient. Take a look at DumbBTree.__delitem__() to get the picture.
So here's an experimental patch to add a clear() method to the DumbBTree class, which clearIndex() will use if available, falling back to the old approach, which I assume is some API standard for bsddb btrees -- which Mailman doesn't use currently.
Near as I can tell, this doesn't break anything, archive threads still get created properly, and while I haven't tested it live on python.org, it ought to speed at least this part up a lot. We'll see if this fixes the problem some of us have seen.
I'm going to try to test this some more before I check it in. I may install it on python.org to see what happens. I'd love some feedback. Does it solve the performance problems? Does anything break because of this patch? Do we need to investigate further?
-Barry
-------------------- snip snip -------------------- Index: HyperDatabase.py
RCS file: /projects/cvsroot/mailman/Mailman/Archiver/HyperDatabase.py,v retrieving revision 1.3 diff -c -r1.3 HyperDatabase.py *** HyperDatabase.py 1998/11/04 23:49:03 1.3 --- HyperDatabase.py 1999/06/30 22:12:24
*** 88,95 **** else: self.current_index = self.current_index + 1
! !
def first(self):
if not self.sorted:
--- 88,97 ---- else: self.current_index = self.current_index + 1
! def clear(self): ! # bulk clearing much faster than deleting each item, esp. with the ! # implementation of __delitem__() above :( ! self.dict = {}
def first(self):
if not self.sorted:
*** 296,302 **** def newArchive(self, archive): pass def clearIndex(self, archive, index): self.__openIndices(archive) ! index=getattr(self, index+'Index') finished=0 try: key, msgid=self.threadIndex.first() --- 298,307 ---- def newArchive(self, archive): pass def clearIndex(self, archive, index): self.__openIndices(archive) ! ## index=getattr(self, index+'Index') ! if hasattr(self.threadIndex, 'clear'): ! self.threadIndex.clear() ! return finished=0 try: key, msgid=self.threadIndex.first()