[BangPypers] Mailman archives analysis

Dorai Thodla dorai at thodla.com
Thu Jul 17 15:02:29 CEST 2008


Just came across this.

MarkMail is a free service for searching mailing list archives, with huge
advantages over traditional search
engines<http://markmail.org/docs/faq.xqy#markmailperks>.
It is powered by MarkLogic Server: Each email is stored internally as an XML
document, and accessed using XQuery. All searches, faceted navigation,
analytic calculations, and HTML page renderings are performed by a small
MarkLogic Server cluster running against millions of messages.
http://markmail.org/

Dorai


On Thu, Jul 17, 2008 at 8:02 AM, Jeff Rush <jeff at taupro.com> wrote:

> Anand Balachandran Pillai wrote:
>
>> Hi Pypers,
>>
>>         Is there any open source tool for analyzing mailman archives ?
>> I want to analyze our mailman archives and then find out the following
>> information.
>>
>>  - Total number of messages
>> - Total number of threads (conversations)
>> - Total number of unique posters
>> -  Maximum size of a thread
>> -  Top 5 posters
>> -  Top 5 threads (in terms of size)
>>
>> Are you aware of any tool (preferably Python) which does this ? The
>> tool should be client-side, taking the URL to the mailman archives
>> page as the only input.
>>
>> If there is nothing like this, perhaps I could think of writing one. It
>> would be useful  I guess...
>>
>
> I'm not aware of any such tool but it would be quite useful.  If you
> produce a library for obtaining the data, I would then hook it into the
> rrdtool (round-robin database) and produce graphs of traffic on various
> mailing lists.  This would help identify growth rates, when to split a list,
> dying lists, etc. which can help others to manage better.  Have a "top 5
> posters" and "top 5 threads" would be useful on the front page of many
> usergroup websites to encourage others to join in.
>
> I would agree that it should be client-side since not all archive sites
> would update Mailman just to use it.  It also should cache data and not
> re-fetch "finished" (i.e. prior months) list archives it has already
> analyzed.  It should not, of course, keep a complete copy of the archive,
> just a summary, by interval of time like month.  Keep the data in SQLite or
> shelve, to keep database needs lightweight for easier integration with
> anyone's choice of web engine.
>
> "Mailwatcher" is born?
>
> -Jeff
>
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>



-- 
Dorai Thodla (http://www.thodla.com)
Thinking about Technology Innovation and Learning
My DailyLog (http://dorai.tumblr.com/) - Stuff worth remembering
US: 650-206-2688, India: 98408 89258
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/bangpypers/attachments/20080717/0102c080/attachment.htm>


More information about the BangPypers mailing list