Re: [Mailman-Developers] [GSoC 2012] Metrics

George Chatzisofroniou writes:
Author
This model represents an author of the mailing list. It mostly keeps track of the number of postings and number of threads started. It has the following fields:
- authorid – IntegerField
AFAIK every Django object has an internal ID. Why do authors need a separate, human-unfriendly "authorid"?
- authormail – CharField
Authors are people. They typically have names<wink/> and often multiple email addresses. There may also be other information (organization, etc) that is available from the headers.
- totalmails – IntegerField
- totalthreads – IntegerField
- firstmsgdate – DateTimeField
- lastmsgdate – DateTimeField
MailingList
This model counts the total number of postings and threads started.
- totalmails – IntegerField
- totalthreads – IntegerField
longestthread?
Month
Year
Views
To display the metrics the Django template system will be used. To output the charts i will create some custom tags. The three following views will be used:
- General page – On top, there will be general metrics about total authors, total mails and total threads and below three charts (AJAX based)
"AJAX based" doesn't belong in the spec; it's an implementation detail.
that represent number of posts per author, number of threads per author and mailing list’s yearly usage. Even below there will be a number of charts (equal to the number of years of list’s existence) that output monthly usage.
Why multiple charts? If you can afford a 640x480 chart area, with 4 pixel wide bars you can have 160 months > 13 years in one chart. I personally wouldn't hesitate to go to pixel width bars, which gives you > 53 years. I don't think people will be looking at charts for precision, but rather to get an overview.
At the end, there will be tabular data representing the authors
- Author page – Each user will have his own page with his own metrics.
Django Admin page – A ‘Generate’ button will be added to the Django admin page. Settings
The Django app should handle the following configuration parameters:
- Host – Message store data host
- Port – Message store data port
- Masking – A multi-state variable (None, abbreviated, full) for masking email addresses at the results (we don’t want the emails to be spammed)
(1) If at all possible, this should be inherited from the list configuration (DRY). It's not useful if the addresses are available from the archives or by subscribing to the list. (Actually, a really sophisticated spammer might want to attack by spoofing frequent posters on the assumption they're more trusted and more read, but that seems second-order to me.) (2) It would be preferable if authors could supply nicknames, full names, or avatars for this purpose.
Interface to the Mailman core
Metrics class
Generate class

On May 19, 2012, at 6:58 AM, Stephen J. Turnbull wrote:
George Chatzisofroniou writes:
Author
This model represents an author of the mailing list. It mostly keeps track of the number of postings and number of threads started. It has the following fields:
- authorid – IntegerField
AFAIK every Django object has an internal ID. Why do authors need a separate, human-unfriendly "authorid"?
Since George will not "own" the information about the author, this is his "foreign key" link into that data.
- authormail – CharField
Authors are people. They typically have names<wink/> and often multiple email addresses. There may also be other information (organization, etc) that is available from the headers.
I think that we should remove ALL of the author information from the MM core and create a separate service to collect and manage it. The mail handling core can subscribe to this service for the little necessary information that it requires about the "persons".
In the real world, the relationship between the organization and the people subscribed to a mailing list often is not centered on the mailing list. They are customers, employees, participants, or such. From the POV of the mailing list, other than authentication of posting/subscription status, those details are not important. There is no reason why the mailing list handler should be the authority/repository for some, but not all, of the information about these persons.
- totalmails – IntegerField
- totalthreads – IntegerField
- firstmsgdate – DateTimeField
- lastmsgdate – DateTimeField
MailingList
This model counts the total number of postings and threads started.
- totalmails – IntegerField
- totalthreads – IntegerField
longestthread?
Month
Year
Views
To display the metrics the Django template system will be used. To output the charts i will create some custom tags. The three following views will be used:
- General page – On top, there will be general metrics about total authors, total mails and total threads and below three charts (AJAX based)
"AJAX based" doesn't belong in the spec; it's an implementation detail.
Agreed. George is still working on understanding the abstraction distinction. He still wants to expose things that should remain "under the hood". He should use more black paint.
that represent number of posts per author, number of threads per author and mailing list’s yearly usage. Even below there will be a number of charts (equal to the number of years of list’s existence) that output monthly usage.
Why multiple charts? If you can afford a 640x480 chart area, with 4 pixel wide bars you can have 160 months > 13 years in one chart. I personally wouldn't hesitate to go to pixel width bars, which gives you > 53 years. I don't think people will be looking at charts for precision, but rather to get an overview.
I would hope that he creates a "chart widget" (Django custom template tag) that will allow the site designer to choose the level of detail and duration covered by a particular instance.
At the end, there will be tabular data representing the authors
- Author page – Each user will have his own page with his own metrics.
Django Admin page – A ‘Generate’ button will be added to the Django admin page. Settings
The Django app should handle the following configuration parameters:
- Host – Message store data host
- Port – Message store data port
- Masking – A multi-state variable (None, abbreviated, full) for masking email addresses at the results (we don’t want the emails to be spammed)
(1) If at all possible, this should be inherited from the list configuration (DRY). It's not useful if the addresses are available from the archives or by subscribing to the list. (Actually, a really sophisticated spammer might want to attack by spoofing frequent posters on the assumption they're more trusted and more read, but that seems second-order to me.) (2) It would be preferable if authors could supply nicknames, full names, or avatars for this purpose.
Agreed. This needs to be a part of the "person" data store.
Interface to the Mailman core
Metrics class
Generate class

On May 19, 2012, at 08:18 AM, Richard Wackerbarth wrote:
I think that we should remove ALL of the author information from the MM core and create a separate service to collect and manage it. The mail handling core can subscribe to this service for the little necessary information that it requires about the "persons".
This should be possible with today's Mailman 3, though it might not be obvious (and certainly isn't tested ;). To do this, you'd implement the IUserManager interface with whatever external-service-consulting implementation you'd like to come up with. Then you'd associate that component implementation with the utility via the zope.configuration file.
Once you've done this, Mailman will always get its users and addresses from the your separate service. Come to think of it though, you probably also need to re-implement the various IRoster implementations as well.
It might take some fiddling and experimentation, but the architecture intends to make this kind of thing possible.
Cheers, -Barry

Hi everyone,
Here's my new report 1 just before the midterm evaluation.
Ι am before the midterm evaluation and i’m having fun coding and learning.
The project has reached the first version and achieves most of the targets set. Specifically:
Metrics store is ready and it is designed in a way to make it easy to add new metrics. Every metric is either an event counter (eg posts sent) or a measured level (eg number of total subscribers) and it is stored in the database representing an interval of a granularity. Special methods that retrieve counts for the specified interval or tally the events with their effects (like a message sent or a new subscription) are also completed.
Most of the graph tags are completed. In order to make it easy for the designer to output the metrics he wishes, i have created a language and its parser. For example, if the designer wishes to output the data of monthly number of posts for the last two years, he can easily extract these counts as follows: “ {% EXTRACT AS graph1 %} posts MONTHLY FOR 2 YEARS {% ENDEXTRACT %} “. The whole syntax of tag’s language is defined by a BNF Language description (you may find an earlier version of this language in a previous post 2).
Usually, the designer will add a template specification for the graph tag and the returned values will be placed in this context, but is is also possible to customize the way this context is presented or create his own template (For example, he can easily change the date format or publish the metrics through a table), since the ’EXTRACT’ tag actually returns a complex data structure that can be used in any manner.
However, some templates are already created for the designer, with the most significant to be the one that outputs the graphs. I used jqplot library 3 (minimal and fast rendered graphs) to render the graphs. So, in the previous example, to build the graph from the extracted data, the designer simply has to add the following to his template: “ {% include “MM3/line_graph.js” with dataset=graph1 title=’List Activity’ %}”, after loading the appropriate plotting libraries (there are ready templates that can be included to do that).
For both metrics store and graph tags, i have already created a number of tests. But i still need to add even more.
There are some things that have not need be done yet and i’ll work on them in the following days. I need to complete the implementation of the language (there are some rules that my parser does not handle yet). I also want to create a template for bar graphs as well, design a coalescor that will maintain the number of entries in the database at a reasonable level and finish my generator that will generate the metrics from scratch (a very first version is done). Last, i need to use IArchiver to make a real connection with MM3 core (i currently use a simulator for testing purposes).
-- George Chatzisofroniou sophron.latthi.com

Hello,
Here's my new report. I've finally published my code and created some samples for all of you. Youhou! :) Any feedback is very welcome.
I’m happy to report that the first version of the software is now public. I have set up the app with some graph samples here 1, and hosted the code in Launchpad 2.
Above each graph, there is the code snippet that is being used to query the database. There are many syntax rules (most of them are already working) that allow a rich vocabulary for the expression of queries. You may see the BNF language description in the section ‘How to use them’ in the documentation 3.
The JS library that renders the graphs is jqplot. It’s not difficult (even for someone with minor experience to Django) to use another plotting library. The data structure returned by the extraction tag is very flexible and can be used in any manner. See the section ‘Configuring the output’ in the documentation 3.
My next step will be to implement the coalescor which (optionally) merges database entries in order to limit the growth of the number of entries stored in the database
Thanks for reading and testing. I would highly appreciate any comments.
-- George Chatzisofroniou sophron.latthi.com
participants (4)
-
Barry Warsaw
-
George Chatzisofroniou
-
Richard Wackerbarth
-
Stephen J. Turnbull