[Mailman-Developers] Performance problems and MailMan

Barry A. Warsaw bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Wed, 30 Jun 1999 18:27:04 -0400 (EDT)


I think I have identified at least one performance bottleneck with
Mailman.  Hopefully, /the/ only bottleneck :)  I think the culprit is
in HyperDatabase.py, namely the DumbBTree class.  This stuff is the
interface b/w Mailman and Pipermail, and as such I am really quite
unfamilar with this code, but using the trick I outlined in a previous 
message, I found that most of the time when I printed the stack trace, 
I found myself in HyperDatabase.clearIndex().

I think the algorithm of using key=dumbtree.next(); del dumbtree[key]
is extremely inefficient.  Take a look at DumbBTree.__delitem__() to
get the picture.

So here's an experimental patch to add a clear() method to the
DumbBTree class, which clearIndex() will use if available, falling
back to the old approach, which I assume is some API standard for
bsddb btrees -- which Mailman doesn't use currently.

Near as I can tell, this doesn't break anything, archive threads still 
get created properly, and while I haven't tested it live on
python.org, it ought to speed at least this part up a lot.  We'll see
if this fixes the problem some of us have seen.

I'm going to try to test this some more before I check it in.  I may
install it on python.org to see what happens.  I'd love some
feedback.  Does it solve the performance problems?  Does anything
break because of this patch?  Do we need to investigate further?

-Barry

-------------------- snip snip --------------------
Index: HyperDatabase.py
===================================================================
RCS file: /projects/cvsroot/mailman/Mailman/Archiver/HyperDatabase.py,v
retrieving revision 1.3
diff -c -r1.3 HyperDatabase.py
*** HyperDatabase.py	1998/11/04 23:49:03	1.3
--- HyperDatabase.py	1999/06/30 22:12:24
***************
*** 88,95 ****
  	else:
  	    self.current_index = self.current_index + 1
  
! 	
! 
  
      def first(self):
          if not self.sorted:
--- 88,97 ----
  	else:
  	    self.current_index = self.current_index + 1
  
!     def clear(self):
!         # bulk clearing much faster than deleting each item, esp. with the
!         # implementation of __delitem__() above :(
!         self.dict = {}
  
      def first(self):
          if not self.sorted:
***************
*** 296,302 ****
      def newArchive(self, archive): pass
      def clearIndex(self, archive, index):
  	self.__openIndices(archive)
! 	index=getattr(self, index+'Index')
  	finished=0
  	try:
  	    key, msgid=self.threadIndex.first()	    		
--- 298,307 ----
      def newArchive(self, archive): pass
      def clearIndex(self, archive, index):
  	self.__openIndices(archive)
! ##	index=getattr(self, index+'Index')
!         if hasattr(self.threadIndex, 'clear'):
!             self.threadIndex.clear()
!             return
  	finished=0
  	try:
  	    key, msgid=self.threadIndex.first()