Re: [Mailman-Developers] [GSoC 2012] Metrics
George,
let me throw in some thoughts just to annoy you ;)
Like with most statistical data I mostly see the figures being used to give statements on quantity - top poster, number of threads etc. Do you think it would be possible to also make some statements on quality?
Let me give an example: Mailing lists are often places where people go to ask for advice. Someone asking usually starts a thread and continually keeps replying. That easily makes a person top poster and might make the same person a thread starter, but number of posts and threads started gives no indication of that persons knowledge (concerning the mailing lists topic).
OTOH someone who has been on the list for ages, who replies more often than starting threads and who ends threads often after she has replied might very well be a very knowledgeable person, because she gives the one answer that solves the problem.
Do you think it would be possible to deduct such quality oriented statements?
p@rick
P.S. Do you also plan to deliver a tool that analyzes a mailing list archive in order to gather your statistical data? Having the statistical data might be a good reason for people to upgrade their MMx installation to MM3.
- George Chatzisofroniou <sophron@latthi.com>:
The following document is the lowest level of my design concept. You may also read it in my blog 1. Of course, comments are very welcome.
-- Models
In order to store statistical data, the app will use some Django models:
Author
This model represents an author of the mailing list. It mostly keeps track of the number of postings and number of threads started. It has the following fields:
- authorid – IntegerField
- authormail – CharField
- totalmails – IntegerField
- totalthreads – IntegerField
- firstmsgdate – DateTimeField
- lastmsgdate – DateTimeField
MailingList
This model counts the total number of postings and threads started.
- totalmails – IntegerField
- totalthreads – IntegerField
Month
This model associates the author and the mailing list with each month.
- author – ForeignKey
- month – CharField
- postscount – IntegerField
- threadscount – IntegerField
- mailinglist – Boolean (if this is true it corresponds to the whole mailing list)
Year
This model is similar to month. It has a year field instead of a month field.
Views
To display the metrics the Django template system will be used. To output the charts i will create some custom tags. The three following views will be used:
General page – On top, there will be general metrics about total authors, total mails and total threads and below three charts (AJAX based) that represent number of posts per author, number of threads per author and mailing list’s yearly usage. Even below there will be a number of charts (equal to the number of years of list’s existence) that output monthly usage. At the end, there will be tabular data representing the authors of the mailing list along with their number of posts, number of threads started and the date of their last post. The user will be able to order the tabular data (alphabetically, ascending/descending on number of posts, number of threads, date of last message) by clicking on the table’s headings (Mail, Mails Sent, Threads Started, Last Message).
Author page – Each user will have his own page with his own metrics. On top, there will be the email of the author, number of posts, number of threads started and the dates of first and last message. Below there will be monthly usage charts for each year the user is subscribed to the mailing list. Django Admin page – A ‘Generate’ button will be added to the Django admin page. Settings
The Django app should handle the following configuration parameters:
- Host – Message store data host
- Port – Message store data port
- Masking – A multi-state variable (None, abbreviated, full) for masking email addresses at the results (we don’t want the emails to be spammed)
Interface to the Mailman core
Metrics class – When a new post is sent, the Metrics class will receive it through the IArchiver interface. The Posts field of the Mailing List model (as well as the the related rows on the Month and Year models) will increase by one. If the author’s email is not in the database, it will query the mailman core database with the email, grab the author’s id and a new Author row will be created. Otherwise if the author is already in the database, the Posts field and the two foreign fields (Month and Year) will increase by one
Generate class – When the ‘Generate’ button on the Admin page is pressed:
- The Django models will be initialized (the metrics will go back to zero). A progress bar will inform the administrator that the operation is being processed.
- All the messages of the archive will be parsed by performing a direct Python call to the IArchiver. Another instance to the IArchiver will grab any mails sent while the parsing is going on.
- The metrics will be generated from scratch.
- The administrator will be informed with a success message when the process is over.
-- George Chatzisofroniou sophron.latthi.com
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/p%40state-of-mind....
Security Policy: http://wiki.list.org/x/QIA9
-- state of mind ()
Franziskanerstraße 15 Telefon +49 89 3090 4664 81669 München Telefax +49 89 3090 4666
Amtsgericht München Partnerschaftsregister PR 563
Hello Patrick,
On Fri, May 18, 2012 at 12:09 AM, Patrick Ben Koetter <p@state-of-mind.de> wrote:
Like with most statistical data I mostly see the figures being used to give statements on quantity - top poster, number of threads etc. Do you think it would be possible to also make some statements on quality?
The metrics will primarily extract the activity of a mailing list and its users.
It is possible to emphasize on the quality of the posts but i think this is a different app (but it interacts with the Metrics one). If the users were able to rank the posts through the archiver it wouldn't be hard to indicate one user's offering to his community. This looks like the first idea on this page 1.
P.S. Do you also plan to deliver a tool that analyzes a mailing list archive in order to gather your statistical data? Having the statistical data might be a good reason for people to upgrade their MMx installation to MM3.
Yes, there will be a special Generate button in case of an existing archive or in case of a system crash.
-- George Chatzisofroniou sophron.latthi.com
- Patrick Ben Koetter <p@state-of-mind.de>:
let me throw in some thoughts just to annoy you ;)
Like with most statistical data I mostly see the figures being used to give statements on quantity - top poster, number of threads etc. Do you think it would be possible to also make some statements on quality?
Let me give an example: Mailing lists are often places where people go to ask for advice. Someone asking usually starts a thread and continually keeps replying. That easily makes a person top poster and might make the same person a thread starter, but number of posts and threads started gives no indication of that persons knowledge (concerning the mailing lists topic).
OTOH someone who has been on the list for ages, who replies more often than starting threads and who ends threads often after she has replied might very well be a very knowledgeable person, because she gives the one answer that solves the problem.
Do you think it would be possible to deduct such quality oriented statements?
As a follow-up: I just stumbled across http://www.mentby.com/patrick-ben-koetter/, which is nice because it also gives an overview over all (here: some) mailing lists an identity posts to.
The second pie chart seems to try to say something about quality. It splits posts in 'relevant' and 'passive', which are not exactly opposites, but well …
Actually I'd say they still need to work on their rating: <http://www.mentby.com/barry-warsaw/> ;)
p@rick
-- state of mind () Digitale Kommunikation
Franziskanerstraße 15 Telefon +49 89 3090 4664 81669 München Telefax +49 89 3090 4666
Amtsgericht München Partnerschaftsregister PR 563
participants (2)
-
George Chatzisofroniou
-
Patrick Ben Koetter