Re: [Mailman-Developers] Approach for Auto moderation system
One concern here is that "Thread" is a fragile term in email. Unless you are planning on some form of message body analysis to group messages together, you are going to need to rely on the In-Reply-To and References headers of the incoming email, which can have its difficulties. If you are going to thread by something else, like the subject, you may find people making minor changes in the subject to bypass the moderation.
Yes, I agree ,but having a system that curbs this problem will not again be a very good solution, for eg. A thread regarding discussion on mailman2 can have the name of the thread "Mailman 2", and for mailman3, the thread will be named as "Mailman 3". Now ,it can also appear that someone has made a minor change to avoid moderation but it is not so.
First, these headers are optional, and some mail agents may not generate them, and more importantly, the subscriber can bypass this linking by creating a reply as a "new message" thus bypassing the auto moderation.
Since I wish to implement this system as a plug-in it will be optional for the list admins and having an MTA that generates headers can be kept as a requirement.
Second, there is an unreliability in these headers as they will not necessarily reference the "start" of the thread, but may only list messages later in the thread, and to get your "Thread Name", you are going to have to keep a full history (for some period back) of messages and what thread you determined them to be in to figure out what thread this message is in.
What we can do is to remove general keywords like "Re:[ ]" and "Fwd:[ ]" from the thread and then add to the database, so we don't need to store all the history, just information about last one does the job, we check that through Table1.
This means that any system that ties to limit the rate "in thread" must also have a similar (but perhaps different value) limit on total postings or creation of new threads.
We do not aim at limiting threads or posts on a thread we are just slowing it down to provide room to other users. If we decrease no of posts on a thread it might affect the discussion going on as some threads are very long and also important and can't be shortened.
Regards Aanand
On Sunday 08 March 2015 09:50 AM, Aanand Shekhar Roy wrote:
One concern here is that "Thread" is a fragile term in email. Unless you are planning on some form of message body analysis to group messages together, you are going to need to rely on the In-Reply-To and References headers of the incoming email, which can have its difficulties. If you are going to thread by something else, like the subject, you may find people making minor changes in the subject to bypass the moderation.
Yes, I agree ,but having a system that curbs this problem will not again be a very good solution, for eg. A thread regarding discussion on mailman2 can have the name of the thread "Mailman 2", and for mailman3, the thread will be named as "Mailman 3". Now ,it can also appear that someone has made a minor change to avoid moderation but it is not so.
Yes, exactly! So what do you think would be a solution to the problem Richard mentioned? You cannot simply say that we are are not going to use any such system since there are drawbacks of it.
We need a clear definition of what you mean by a "thread". What type of message would be a part of a thread and which all would be not? Is this kind of auto moderation system just based on subject lines and headers even reliable?
I would ask you to dig through standards and implementations in MUAs about how threads are actually created. And then answer the questions above.
First, these headers are optional, and some mail agents may not generate them, and more importantly, the subscriber can bypass this linking by creating a reply as a "new message" thus bypassing the auto moderation.
Since I wish to implement this system as a plug-in it will be optional for the list admins and having an MTA that generates headers can be kept as a requirement.
It isn't a good idea to impose a restriction on MTA to use a plugin. Even this very email of yours was not listed under the same thread in my MUA (thunderbird) for some reason I don't know.
Second, there is an unreliability in these headers as they will not necessarily reference the "start" of the thread, but may only list messages later in the thread, and to get your "Thread Name", you are going to have to keep a full history (for some period back) of messages and what thread you determined them to be in to figure out what thread this message is in.
What we can do is to remove general keywords like "Re:[ ]" and "Fwd:[ ]" from the thread and then add to the database, so we don't need to store all the history, just information about last one does the job, we check that through Table1.
How do you know where does the thread started from the information in your table?
This means that any system that ties to limit the rate "in thread" must also have a similar (but perhaps different value) limit on total postings or creation of new threads.
We do not aim at limiting threads or posts on a thread we are just slowing it down to provide room to other users. If we decrease no of posts on a thread it might affect the discussion going on as some threads are very long and also important and can't be shortened.
I don't understand the use case of "slowing" down the delivery of emails at all. Why would someone want his email to be sent to the list a later time? What does "provide room to other users" mean in this context? How is one person sending a lot of mails stop others from doing the same?
-- thanks, Abhilash
On 3/7/15 11:20 PM, Aanand Shekhar Roy wrote:
One concern here is that "Thread" is a fragile term in email. Unless you are planning on some form of message body analysis to group messages together, you are going to need to rely on the In-Reply-To and References headers of the incoming email, which can have its difficulties. If you are going to thread by something else, like the subject, you may find people making minor changes in the subject to bypass the moderation. Yes, I agree ,but having a system that curbs this problem will not again be a very good solution, for eg. A thread regarding discussion on mailman2 can have the name of the thread "Mailman 2", and for mailman3, the thread will be named as "Mailman 3". Now ,it can also appear that someone has made a minor change to avoid moderation but it is not so. First, these headers are optional, and some mail agents may not generate them, and more importantly, the subscriber can bypass this linking by creating a reply as a "new message" thus bypassing the auto moderation. Since I wish to implement this system as a plug-in it will be optional for the list admins and having an MTA that generates headers can be kept as a requirement. It isn't the MTA that deals with the threading, but the MUA, and that can't be controlled by the list admin (as it is the USER agent), MUAs don't announce that they don't support them or that they were used in a way to bypass generating them.
My comment isn't that you can't create a system as you have defined it, but that there is a fundamental flaw in your design that will mean it is very easy for people to bypass it, making the add-on basically worthless. It will limit the posting of the people who follow the rules, but not those who figure out the holes and bypass the limit. The people who tend to follow the rules are rarely the problem.
Second, there is an unreliability in these headers as they will not necessarily reference the "start" of the thread, but may only list messages later in the thread, and to get your "Thread Name", you are going to have to keep a full history (for some period back) of messages and what thread you determined them to be in to figure out what thread this message is in. What we can do is to remove general keywords like "Re:[ ]" and "Fwd:[ ]" from the thread and then add to the database, so we don't need to store all the history, just information about last one does the job, we check that through Table1. Ok, this make it clear that you are not working on "Threads" as defined by the reference headers, but something similar that only uses the Subject header, which is MUCH more prone to being "gamed" by minor tweaking that might not even be noticed without close looking.
(Perhaps you don't understand the real concept of threading as your email client doesn't support it, I see my replies attached to your messages, slightly indented, while your reply to me became a "new thread" as your client didn't indicate what it was a reply to. You can see this same effect in the lists threaded archives).
This is why I say that having only "per-thread" limits isn't workable, as threads are too fragile. If you need to have a per-thread limit you need to also either limit total-posts or new-thread posts.
This means that any system that ties to limit the rate "in thread" must also have a similar (but perhaps different value) limit on total postings or creation of new threads. We do not aim at limiting threads or posts on a thread we are just slowing it down to provide room to other users. If we decrease no of posts on a thread it might affect the discussion going on as some threads are very long and also important and can't be shortened.
Regards Aanand
Yes, I understand that the limit isn't a limit on absolute number of posts, but a "rate-limit" on posting.
-- Richard Damon
Richard Damon writes:
This is why I say that having only "per-thread" limits isn't workable, as threads are too fragile.
Systers-style "dynamic lists" may fix this problem.
On most discussion lists subscription is open, so you'd end up with people bypassing any restriction by creating new addresses.
I think in general this kind of throttling would need to have the support of the subscribers, or it wouldn't be workable.
On 3/8/15 5:03 PM, Stephen J. Turnbull wrote:
Richard Damon writes:
This is why I say that having only "per-thread" limits isn't workable, as threads are too fragile.
Systers-style "dynamic lists" may fix this problem.
On most discussion lists subscription is open, so you'd end up with people bypassing any restriction by creating new addresses.
I think in general this kind of throttling would need to have the support of the subscribers, or it wouldn't be workable. Yes, "sub lists" could probably handle it, as well as mailman's 'topics' (as long as unclassified was considered a topic).
Yes, sock-puppets can be a problem, but most of the people in this sort of discussion want the name recognition, so it isn't quite as much of a problem.
And lastly, YES, a general support from the list on this sort of action would be needed. On one moderate volume list I run, something like this would be very welcome, most of the subscribers would love it, and even many of the people who would get throttled would accept it, as it is operating in a "fair" manner (it affects everyone the same). The one complaint I could see are the few people with small minority views who might think that each side should get a similar number of posts, not each poster.
-- Richard Damon
participants (4)
-
Aanand Shekhar Roy
-
Abhilash Raj
-
Richard Damon
-
Stephen J. Turnbull