[BangPypers] Mailman archives analysis

Jeff Rush jeff at taupro.com
Thu Jul 17 04:32:18 CEST 2008


Anand Balachandran Pillai wrote:
> Hi Pypers,
> 
>          Is there any open source tool for analyzing mailman archives ?
> I want to analyze our mailman archives and then find out the following
> information.
> 
>  - Total number of messages
> - Total number of threads (conversations)
> - Total number of unique posters
> -  Maximum size of a thread
> -  Top 5 posters
> -  Top 5 threads (in terms of size)
> 
> Are you aware of any tool (preferably Python) which does this ? The
> tool should be client-side, taking the URL to the mailman archives
> page as the only input.
> 
> If there is nothing like this, perhaps I could think of writing one. It
> would be useful  I guess...

I'm not aware of any such tool but it would be quite useful.  If you produce a 
library for obtaining the data, I would then hook it into the rrdtool 
(round-robin database) and produce graphs of traffic on various mailing lists. 
  This would help identify growth rates, when to split a list, dying lists, 
etc. which can help others to manage better.  Have a "top 5 posters" and "top 
5 threads" would be useful on the front page of many usergroup websites to 
encourage others to join in.

I would agree that it should be client-side since not all archive sites would 
update Mailman just to use it.  It also should cache data and not re-fetch 
"finished" (i.e. prior months) list archives it has already analyzed.  It 
should not, of course, keep a complete copy of the archive, just a summary, by 
interval of time like month.  Keep the data in SQLite or shelve, to keep 
database needs lightweight for easier integration with anyone's choice of web 
engine.

"Mailwatcher" is born?

-Jeff



More information about the BangPypers mailing list