George Chatzisofroniou writes:
> Author
>
> This model represents an author of the mailing list. It mostly keeps
> track of the number of postings and number of threads started. It has
> the following fields:
>
> - authorid – IntegerField
AFAIK every Django object has an internal ID. Why do authors need a
separate, human-unfriendly "authorid"?
> - authormail – CharField
Authors are people. They typically have names<wink/> and often
multiple email addresses. There may also be other information
(organization, etc) that is available from the headers.
> - totalmails – IntegerField
> - totalthreads – IntegerField
> - firstmsgdate – DateTimeField
> - lastmsgdate – DateTimeField
>
> MailingList
>
> This model counts the total number of postings and threads started.
>
> - totalmails – IntegerField
> - totalthreads – IntegerField
longestthread?
>
> Month
>
> Year
>
> Views
>
> To display the metrics the Django template system will be used. To
> output the charts i will create some custom tags. The three following
> views will be used:
>
> - General page – On top, there will be general metrics about total
> authors, total mails and total threads and below three charts (AJAX
> based)
"AJAX based" doesn't belong in the spec; it's an implementation detail.
> that represent number of posts per author, number of threads
> per author and mailing list’s yearly usage. Even below there will be a
> number of charts (equal to the number of years of list’s existence)
> that output monthly usage.
Why multiple charts? If you can afford a 640x480 chart area, with 4
pixel wide bars you can have 160 months > 13 years in one chart. I
personally wouldn't hesitate to go to pixel width bars, which gives
you > 53 years. I don't think people will be looking at charts for
precision, but rather to get an overview.
> At the end, there will be tabular data representing the authors
>
> - Author page – Each user will have his own page with his own metrics.
>
> Django Admin page – A ‘Generate’ button will be added to the Django admin page.
> Settings
>
> The Django app should handle the following configuration parameters:
>
> - Host – Message store data host
> - Port – Message store data port
> - Masking – A multi-state variable (None, abbreviated, full) for
> masking email addresses at the results (we don’t want the emails to be
> spammed)
(1) If at all possible, this should be inherited from the list
configuration (DRY). It's not useful if the addresses are
available from the archives or by subscribing to the list.
(Actually, a really sophisticated spammer might want to attack by
spoofing frequent posters on the assumption they're more trusted
and more read, but that seems second-order to me.)
(2) It would be preferable if authors could supply nicknames, full
names, or avatars for this purpose.
> Interface to the Mailman core
>
> - Metrics class
>
> - Generate class
George,
let me throw in some thoughts just to annoy you ;)
Like with most statistical data I mostly see the figures being used to give
statements on quantity - top poster, number of threads etc. Do you think it
would be possible to also make some statements on quality?
Let me give an example: Mailing lists are often places where people go to ask
for advice. Someone asking usually starts a thread and continually keeps
replying. That easily makes a person top poster and might make the same person
a thread starter, but number of posts and threads started gives no indication
of that persons knowledge (concerning the mailing lists topic).
OTOH someone who has been on the list for ages, who replies more often than
starting threads and who ends threads often after she has replied might very
well be a very knowledgeable person, because she gives the one answer that
solves the problem.
Do you think it would be possible to deduct such quality oriented statements?
p@rick
P.S.
Do you also plan to deliver a tool that analyzes a mailing list archive in
order to gather your statistical data? Having the statistical data might be a
good reason for people to upgrade their MMx installation to MM3.
* George Chatzisofroniou <sophron(a)latthi.com>:
> The following document is the lowest level of my design concept. You
> may also read it in my blog [1]. Of course, comments are very welcome.
>
> --
> Models
>
> In order to store statistical data, the app will use some Django models:
>
> Author
>
> This model represents an author of the mailing list. It mostly keeps
> track of the number of postings and number of threads started. It has
> the following fields:
>
> - authorid – IntegerField
> - authormail – CharField
> - totalmails – IntegerField
> - totalthreads – IntegerField
> - firstmsgdate – DateTimeField
> - lastmsgdate – DateTimeField
>
> MailingList
>
> This model counts the total number of postings and threads started.
>
> - totalmails – IntegerField
> - totalthreads – IntegerField
>
> Month
>
> This model associates the author and the mailing list with each month.
>
> - author – ForeignKey
> - month – CharField
> - postscount – IntegerField
> - threadscount – IntegerField
> - mailinglist – Boolean (if this is true it corresponds to the whole
> mailing list)
>
> Year
>
> This model is similar to month. It has a year field instead of a month field.
>
> Views
>
> To display the metrics the Django template system will be used. To
> output the charts i will create some custom tags. The three following
> views will be used:
>
> - General page – On top, there will be general metrics about total
> authors, total mails and total threads and below three charts (AJAX
> based) that represent number of posts per author, number of threads
> per author and mailing list’s yearly usage. Even below there will be a
> number of charts (equal to the number of years of list’s existence)
> that output monthly usage. At the end, there will be tabular data
> representing the authors of the mailing list along with their number
> of posts, number of threads started and the date of their last post.
> The user will be able to order the tabular data (alphabetically,
> ascending/descending on number of posts, number of threads, date of
> last message) by clicking on the table’s headings (Mail, Mails Sent,
> Threads Started, Last Message).
>
> - Author page – Each user will have his own page with his own metrics.
> On top, there will be the email of the author, number of posts, number
> of threads started and the dates of first and last message. Below
> there will be monthly usage charts for each year the user is
> subscribed to the mailing list.
> Django Admin page – A ‘Generate’ button will be added to the Django admin page.
> Settings
>
> The Django app should handle the following configuration parameters:
>
> - Host – Message store data host
> - Port – Message store data port
> - Masking – A multi-state variable (None, abbreviated, full) for
> masking email addresses at the results (we don’t want the emails to be
> spammed)
>
> Interface to the Mailman core
>
> - Metrics class – When a new post is sent, the Metrics class will
> receive it through the IArchiver interface. The Posts field of the
> Mailing List model (as well as the the related rows on the Month and
> Year models) will increase by one. If the author’s email is not in the
> database, it will query the mailman core database with the email, grab
> the author’s id and a new Author row will be created. Otherwise if the
> author is already in the database, the Posts field and the two foreign
> fields (Month and Year) will increase by one
>
> - Generate class – When the ‘Generate’ button on the Admin page is pressed:
> * The Django models will be initialized (the metrics will go back to
> zero). A progress bar will inform the administrator that the operation
> is being processed.
> * All the messages of the archive will be parsed by performing a
> direct Python call to the IArchiver. Another instance to the IArchiver
> will grab any mails sent while the parsing is going on.
> * The metrics will be generated from scratch.
> * The administrator will be informed with a success message when the
> process is over.
>
> [1]: http://sophron.latthi.com/gsoc-mailman/
>
> --
> George Chatzisofroniou
> sophron.latthi.com
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers(a)python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/p%40state-of-mind…
>
> Security Policy: http://wiki.list.org/x/QIA9
--
state of mind ()
http://www.state-of-mind.de
Franziskanerstraße 15 Telefon +49 89 3090 4664
81669 München Telefax +49 89 3090 4666
Amtsgericht München Partnerschaftsregister PR 563
Meeow miaou*
We spoke on IRC about the archiver the other day and I said that I
should present here my thoughts about it. So here they are (beware that
might be long).
First I think we should think about the structure/architecture of
things. We have a number of component which need to be archives aware,
without being exhaustive I'm thinking about:
- the archiver itself (which present the archive (ie: mails and threads)
- the NNTP bits which should be able to return emails and/or threads
- the stats module which want to give information to the user about the
health of the list itself (emails/month, last threads, biggest
threads...)
- archives retrieval (we probably want to give the user a way to
download the archives since the creation of the list/the last
year/month)
All of these components needs to be aware about the archives. We agreed
that the core does not want to know about it.
So we have several solutions:
- each module becomes an "archiver" wrt to core, meaning each module has
its own way to storing the archives (and eventually its own system to do
so)
- we create a archive-core module which manage the archives and provides
an API to access, modify, extend them.
Of course, we prefer the second solution :)
So we would have the following architecture:
mm-core (handles the lists themselves) --send emails to archivers-->
archive-core (store the emails and expose them through an API) -->
archivers/stats/NNTP
The questions are then:
- how do we store the emails ?
- how do we expose the API ?
- how to make it such that it becomes easy to extend ? (ie: the stats
module wants to read the db, but probably also to store information on
it)
Having played with mongodb (HK relies on it atm), I quite like the
possibilities it gives us. We can easily store the emails in it, query
them and since it is a NoSQL database system extending it becomes also
easy.
On the other hand, having the archiver-core relying on the same system
as the core itself would be nicer from a sysadmin pov. I have not tried
to upload archives to a RDBMS and test its speed, but for mongodb the
results of the tests are presented at [1].
The challenge will be speed and designing an API which allow each
component to do its work.
I think it would be nice if we could reach some kind of agreement before
the GSoC starts (even if we change our mind later on) to be sure that if
we get students their work don't overlap too much.
The second point I want to present is with respect to the archiver
itself.
At the moment we have HyperKitty (HK), the current version:
- exposes single emails
- exposes single threads
- presents the archives for one month or day
- allows to search the archives using the sender, subject, content or
subject and content
- presents a summary of the recent activities on the list (including the
evolution of the number of post sent over the last month)
I think these are the basis functionality that we would like to see in
an archiver.
But HK aims at much more, the ultimate goal of HK is to provide a
"forum-like" interface to the mailing-lists, with it HK would provide a
number of option (social-web like) allowing to "like" or "dislike" a
post or a thread, allowing to "+1" someone, allowing to tag the mails or
assign them categories.
These are all nice feature but, imho, they go beyond what one would want
from a basic archiver.
So what I would like to propose is to split HK into a sub-project
(MiniKitty?) which would provide these basic functionality.
We would keep HyperKitty as a more extensive archiver and try to bring
HK to its ultimate goal. This will need some more work and time as we
will have to make HK speak with core for authentication, find a way to
send emails to core/the lists and of course add all the other features
(tags, categories...)
Comments welcome :)
Thanks,
Pierre
[1]
http://blog.pingoured.fr/index.php?post/2012/03/16/Mailman-archives-and-mon…
* Hi everyone
George Chatzisofroniou writes:
> Hello Patrick,
>
> On Fri, May 18, 2012 at 12:09 AM, Patrick Ben Koetter
> <p(a)state-of-mind.de> wrote:
> > Like with most statistical data I mostly see the figures being
> > used to give statements on quantity - top poster, number of
> > threads etc. Do you think it would be possible to also make some
> > statements on quality?
+1
For example, I think this would be really useful for class discussion
lists and the like (on the theory that the best way to learn a subject
is to teach it to others).
> The metrics will primarily extract the activity of a mailing list
> and its users.
>
> It is possible to emphasize on the quality of the posts but i think
> this is a different app
I don't see why. I would think quality metrics would be usefully
presented via the same application as quantity metrics. It would be
interesting to correlate quality and quantity, for example.
Do you mean to say "this is out of scope of my project?" As much as
I'd like to see quality metrics provided, I'd have to agree with you
that it's out of scope of your project (maybe you could do one or more
quality metrics on a time-permitting basis at the end of the period).
Dear all,
It has been some time since the last info regarding HyperKitty, but the
project has made some progress.
I have implemented to database interface that I was presented already a
month ago [1].
So now there is a standalone project/library, KittyStore [2] which
provides an interface to the database. It defines an interface which can
then be implemented for whatever database system you would like. At the
moment it covers MongoDB and PostgreSQL.
Once I implemented this interface I tried to solve the question of which
database system should we primarily focus on.
I wrote a small comparison test of the two systems on an RHEL6 system
[3] (I still have to publish the results for F17).
The difference between the two databases system is not so large (1s for
a query that already takes 6 seconds) and there was not any tuning of
the servers.
So I think the advantage of having only one back-end for mailman and its
archive is worth this time difference in the results (which might anyway
get even better).
Thus we will move forward with PostgreSQL as a back-end for HyperKitty.
The good news is that HyperKitty already works fine with PostgreSQL and
KittyStore (if you use the correct branch [4]).
However, we have to rebuild our test server, so we cannot show you how
the latest version works right now.
So all is nicely getting in place for Aamir to start working on his
project and for HK to get further down the road :)
I think that's about all I wanted to say, fire away if you have
questions!
Best regards,
Pierre
[1]
http://mail.python.org/pipermail/mailman-developers/2012-April/022012.html
[2] https://github.com/pypingou/kittystore
[3]
http://blog.pingoured.fr/index.php?post/2012/05/20/PostgreSQL-vs-MongoDB
[4] http://bzr.fedorahosted.org/bzr/hyperkitty/rdbms/files
Aamir Khan writes:
> Hi everybody!
>
> First task I am going to do for my GSoC project is to have login
> authentication mechanism for HyperKitty users. I have had discussion with
> few mailman developers about it. I am planning to wrap up social_auth into
> mm_ui_auth django application. Both postorius and HyperKitty can use this
> app for authenticating users.
Please explain in somewhat more detail. Not all of us know what
"social_auth" is, and since Postorius and HyperKitty are independent
apps, presumably there will need to be some design and coordination
effort to get everybody on the same page.
In the spirit of "DRY" (don't repeat yourself), rather than make long
posts here, I suggest that you start a blog or a Wiki page on the
Mailman site where you can log (1) your design decisions and (2) any
agreements on APIs etc you make with other Mailman developers (or
third parties such as maintainers of libraries you use). Then you can
simply have a shortcut key and say
"I've designed an authentication mechanism for HyperKitty users, which
I expect to be extensible for use by Postorius and other Django apps
for Mailman. See my blog: http://blogs.example.com/~aamir/gsoc/."
George is doing this quite well IMO. Kudos to George!
N.B. George is generally copying his blog post to this list. That's
OK if you want to do it, but IMO not necessary. (But that's something
we will evolve over time, and I defer to Terri's opinion on this kind
of thing. This is all just a suggestion.)
Hi Aamir,
On 05/30/2012 01:45 AM, Aamir Khan wrote:
> Hi everybody!
>
> First task I am going to do for my GSoC project is to have login
> authentication mechanism for HyperKitty users. I have had discussion
> with few mailman developers about it. I am planning to wrap up
> social_auth into mm_ui_auth django application. Both postorius and
> HyperKitty can use this app for authenticating users.
As you know, Postorius already uses django-social-auth. Implementing it
was more or less a configuration issue (settings.py), plus we had to
build a login-template (which was pretty trivial).
We did look at some other existing solutions before choosing
django-social-auth. So far it seems like a good choice. (Although it
should be noted, that there was one report about a browserID logged-in
user getting logged-out erroneously. But I haven't been able to
reproduce this.)
Since social-auth has been designed with DRY/re-usability in mind, the
important question (at least to me) is: What are the advantages of an
mm_ui_auth app wrapper compared to simply using it directly?
The main advantage I can think of is shared UI-resources (template,
css). But this reduces mm_ui_auth more or less to a "theme only" app
(OK, with a few lines of login/logout view code in it). This can make
absolute sense, but mainly if both Postorius/HK are OK with sharing the
same design for the login-/out pages. I, for one, would be fine with
that. (Well, let's see how it looks first. But still... ;-)
Cheers,
Florian
On May 30, 2012, at 11:06 AM, Robert Niederreiter wrote:
> actually, in postorius you need to be authenticated in order to
> subscribe to a list
>
> "anonymous" refers to the authentication in postorius in this case
>
> it would be nice to just have subscription form available to the public,
> sending a "subscription confirmation" to the entered mail address with a
> confirmation link, finally do the list subscription if confirmation link
> gets clicked.
>
> Robert
+2 :)
I think that it is important for this to be controlled an administrative list policy setting.
When someone logs in using BrowserID, for example, and then subscribes to a new mailing list, the subscription can be marked as "email confirmed". But if they simply enter an email address on a form, the website need to be able to trigger the issuance of a "request subscription confirmation" email.
Richard
On May 30, 2012, at 6:21 AM, Robert Niederreiter wrote:
> Public bug reported:
>
> Currently it is not possible to subscribe as anonymous user.
>
> ** Affects: postorius
> Importance: Undecided
> Status: New
> ** Tags: anonymous confirmation subscription ui
Hi everybody!
First task I am going to do for my GSoC project is to have login
authentication mechanism for HyperKitty users. I have had discussion with
few mailman developers about it. I am planning to wrap up social_auth into
mm_ui_auth django application. Both postorius and HyperKitty can use this
app for authenticating users.
Feel free to suggest if you have any ideas about how login should work in
HyperKitty.
--
Aamir Khan | 3rd Year | Computer Science & Engineering | IIT Roorkee