Pipermail, large archives and performance
I am wondering what that road map is for mailman.
Meaning what are the new features slated for the future. What enhancements to current features are planned.
I posted a feature enchancement to SF:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128127&group_id=103
And pipermail is really causing problems. I have an mbox file that is 1.92Gb in size and it takes almost 14 hours to process the qfiles.
I believe this is because qrunner(?) has a difficult time processing such large files to generate the archives. Am I way off base here?
Browsing the archives is another unpleasant task. I think pipermail(?) does not work well on files of this size either.
Is there any plans for enhance/replace these problems?
I am willing to put my coding skill to work to fix these issues if there is some road map to follow.
Thanks.
-- Bob Tanner <tanner@real-time.com> | Phone : (952)943-8700 http://www.mn-linux.org | Fax : (952)943-8500 Key fingerprint = 6C E9 51 4F D5 3E 4C 66 62 A9 10 E5 35 85 39 D9
"BT" == Bob Tanner <tanner@real-time.com> writes:
BT> I am wondering what that road map is for mailman.
I've been swamped with other work, specifically Python 2.1a1 and "paying gig" assignments. :) OTOH, I have a big chunk of rough Mailman 2.1 code that I'm working on, specifically finishing up the i18n stuff and a complete reworking of the qrunner subsystem along the lines of stuff discussed last year. It needs more testing before I can start checking things in.
I'd like to have something released by the Python conference in March, though that will realistically just be a beta of 2.1. I'm on the hook for giving a talk about I18N at the conference, so expect a big push to get all that working. :)
I had hoped to rewrite the persistent storage subsystem using zodb, but that's not going to happen until I can finish up some zodb enhancements (part of that aforementioned "paying gig").
My goal is still to add at least the features described on the Mailman 2.1 wiki page:
http://www.zope.org/Members/bwarsaw/MailmanDesignNotes/MailmanTwoDotOne
Feel free to add you're own wish list to this page as a comment (which you should be able to do without logging into zope.org -- let me know if there are any problems with that).
BT> And pipermail is really causing problems. I have an mbox file
BT> that is 1.92Gb in size and it takes almost 14 hours to process
BT> the qfiles.
This will be improved in 2.1 because there will be a separate qrunner for the archiver. It won't be in the critical path for getting mail through the system. It won't help Pipermail itself -- for that I'm still looking for someone to "own" that subsystem. I want Mailman to include a bundled Python archiver to make it easy to "download-and-go", but I don't have time to concentrate on improving Pipermail itself. Jeremy's done a great job for 2.0 but it really could use a rewrite.
BT> I am willing to put my coding skill to work to fix these
BT> issues if there is some road map to follow.
If you want to start looking at improving Pipermail and generalizing the interface b/w the archiver and Mailman, that would be way cool.
-Barry
Is there a way to make qrunner more verbose?
I am trying to figure out why qrunner is not processing my qfile directory.
-- Bob Tanner <tanner@real-time.com> | Phone : (952)943-8700 http://www.mn-linux.org | Fax : (952)943-8500 Key fingerprint = 6C E9 51 4F D5 3E 4C 66 62 A9 10 E5 35 85 39 D9
"BT" == Bob Tanner <tanner@real-time.com> writes:
BT> Is there a way to make qrunner more verbose?
You can always add syslog() calls to the script to trace its progress (i.e. crappy but reliable ol' printf :)
BT> I am trying to figure out why qrunner is not processing my
BT> qfile directory.
What version of Mailman are you running and how many files are in the qfile directory? You may need to upgrade to 2.0.1.
-Barry
Quoting Barry A. Warsaw (barry@digicool.com):
"BT" == Bob Tanner <tanner@real-time.com> writes:
BT> Is there a way to make qrunner more verbose?
You can always add syslog() calls to the script to trace its progress (i.e. crappy but reliable ol' printf :)
BT> I am trying to figure out why qrunner is not processing my BT> qfile directory.
What version of Mailman are you running and how many files are in the qfile directory? You may need to upgrade to 2.0.1.
Running 2.0
5,000 and climbing.
-- Bob Tanner <tanner@real-time.com> | Phone : (952)943-8700 http://www.mn-linux.org | Fax : (952)943-8500 Key fingerprint = 6C E9 51 4F D5 3E 4C 66 62 A9 10 E5 35 85 39 D9
"BT" == Bob Tanner <tanner@real-time.com> writes:
BT> Running 2.0
BT> 5,000 and climbing.
Oh you definitely want to apply the 2.0 -> 2.0.1 patch. It fixes the bug you're experiencing.
You can get it from Sourceforge. See the NEWS file entry for details. -Barry
On Thu, Jan 25, 2001 at 02:20:06PM -0600, Bob Tanner wrote:
I am wondering what that road map is for mailman.
Meaning what are the new features slated for the future. What enhancements to current features are planned.
I posted a feature enchancement to SF:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128127&group_id=103
And pipermail is really causing problems. I have an mbox file that is 1.92Gb in size and it takes almost 14 hours to process the qfiles.
I believe this is because qrunner(?) has a difficult time processing such large files to generate the archives. Am I way off base here?
You are absolutely correct, I did some testing with sending 1000 messages to a list, and time went from 4mn until I received the last message to more than one hour when I had HTML archiving enabled.
For sourceforge, I felt we couldn't afford the slowdown and turned off HTML archiving. I've left the text archiving enabled as it shouldn't be the part that's the slowest.
Ideally, it'd be nice if we could update the HTML archive off line, once an hour or so, but apparently that's not possible (I didn't get any answer back to my enquiring Email)
Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
Quoting Marc MERLIN (marc_news@valinux.com):
For sourceforge, I felt we couldn't afford the slowdown and turned off HTML archiving. I've left the text archiving enabled as it shouldn't be the part that's the slowest.
I did not know you could do just text archiving. In 2.0, I don't see the option to archive just text.
-- Bob Tanner <tanner@real-time.com> | Phone : (952)943-8700 http://www.mn-linux.org | Fax : (952)943-8500 Key fingerprint = 6C E9 51 4F D5 3E 4C 66 62 A9 10 E5 35 85 39 D9
On Fri, Jan 26, 2001 at 12:49:16AM -0600, Bob Tanner wrote:
Quoting Marc MERLIN (marc_news@valinux.com):
For sourceforge, I felt we couldn't afford the slowdown and turned off HTML archiving. I've left the text archiving enabled as it shouldn't be the part that's the slowest.
I did not know you could do just text archiving. In 2.0, I don't see the option to archive just text.
~mailman/Mailman/mm_cfg.py: # ARCHIVE_TO_MBOX #-1 - do not do any archiving # 0 - do not archive to mbox, use builtin mailman html archiving only # 1 - archive to mbox to use an external archiving mechanism only # 2 - archive to both mbox and builtin mailman html archiving - # use this to make both external archiving mechanism work and # mailman's builtin html archiving. the flat mail file can be # useful for searching, external archivers, etc. # ARCHIVE_TO_MBOX = 1
Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
"MM" == Marc MERLIN <marc_news@valinux.com> writes:
MM> Ideally, it'd be nice if we could update the HTML archive off
MM> line, once an hour or so, but apparently that's not possible
MM> (I didn't get any answer back to my enquiring Email)
2.1 should be much better here. First, the archiver will run out of a separate queue, and thus in a separate process. Second, you will probably be able to tune how frequently each separate qrunner process will run so you could make the archiver run once an hour if you want (this may be crude at first).
The one trick that I haven't thought about yet is the list locking issues. I'm pretty sure that the archiver can do most of its work without locking the list, but I have to double check on that. If the archiver holds the list lock for a long time, it'll block any processing of new messages to the list, although it won't block the uploading of processed messages to the smtpd or nttpd -- those qrunners never acquire the list lock.
experimental-ly y'rs, -Barry
participants (3)
-
barry@digicool.com
-
Bob Tanner
-
Marc MERLIN