[ mailman-Bugs-835332 ] Stops bloat in pipermail article databases

SourceForge.net noreply at sourceforge.net
Mon Nov 3 17:04:01 EST 2003


Bugs item #835332, was opened at 2003-11-03 22:04
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=835332&group_id=103

Category: Pipermail
Group: 2.1 (stable)
Status: Open
Resolution: None
Priority: 5
Submitted By: Richard Barrett (ppsys)
Assigned to: Nobody/Anonymous (nobody)
Summary: Stops bloat in pipermail article databases

Initial Comment:
The standard pipermail archiving code saves the body text, 
in HTML format, of every article in the -article database of 
each archived list. This bloats the size of 
these databases. Because they are pickled data structures, 
which are loaded into memory in their entirety when 
archiving operations for a list are being handled, this bloat 
can substantially prejudice archiver performance and in the 
limit, for lists carrying heavy traffic and/or receiving large 
text postings, bring archiving to a grinding halt.

This patch changes HyperArch.py and pipermail.py so that 
the data stored in the pipermail <code>$archives/private/
<listname>/database/<period>-article</code> does not 
include the body text, in HTML format, of each article. This 
reduces the size of the -article database for each list. The 
benefits of this are most pronounced with high traffic lists 
and those to which large text postings are made. 

The patch also adds a script $prefix/bin/rb-arch which will 
remove any body text, in HTML format, from existing -
article databases; this junk HTML is no longer added when 
new articles are added to the databases but existing junk 
HTML is not deleted unless this script is run. The alternative 
is to run $prefix/bin/arch for a list.

Apply the patch from within the Mailman build directory 
using the command: 

    patch -p1 < path-to-patch-file

You use this patch at your own risk and I would appreciate 
feedback about whether it works for you if you use it or/and 
any problems you encounter with the patch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=835332&group_id=103



More information about the Mailman-coders mailing list