Mailman 3 Bounce processing should be done to only USER entities? - Mailman-Developers

newer
Rich Text support in Hyperkitty

Bounce processing should be done to only USER entities?

older
Problems with Migration from...

Aaryan Bhagat

14 Jun 2019 14 Jun '19

9:25 a.m.

This is with reference to a discussion in a thread 1 here:

I need two requirements :

I need an object of a user which has an attribute which tells us what are all the emails of the following user.
I need an object of an email having an attribute which tells us what mailing-lists that email has been subscribed to.

For the first one, we already have a User model each object of which is mapped to more than one Address objects For the second one, we technically have the Address class but it does not contain the details as to which lists email has subscribed. What we do know is an object of Member class is created when we subscribe. Pointers on the above would be helpful

I can do the following, Create a many-to-one relationship between Member and Address Variables for processing bounces including CURRENT_BOUNCE_SCORE and CROSSED_OR_NOT will be in the Member object MODIFIED_THRESHOLD variables will be in Address model ( Please refer to my proposal2 under the "Approach and Detailed Explanation" Section if the above seems foreign to you )

Is the following approach plausible or am I missing something here?

Show replies by date

Aaryan Bhagat

14 Jun 14 Jun

9:28 a.m.

The title to this inappropriate I accidentally copy-pasted something else at the end. Pardon my inappropriate title.

I actually mean "Creation of bounce variables under which models?"

Mark Sapiro

2:45 p.m.

On 6/14/19 2:25 AM, Aaryan Bhagat wrote:

...

This is with reference to a discussion in a thread [1] here:

I need two requirements :

I need an object of a user which has an attribute which tells us what are all the emails of the following user.

User objects have an addresses attribute which is a list of all the Address objects associated with that user.

...

I need an object of an email having an attribute which tells us what mailing-lists that email has been subscribed to.

Why? Your proposal indicates you understand that a bounce is associated with a particular address and list and should not affect that address on other lists. So why are you interested in all the lists associated with that address?

-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Aaryan Bhagat

5:22 p.m.

...

Why? Your proposal indicates you understand that a bounce is associated with a particular address and list and should not affect that address on other lists. So why are you interested in all the lists associated with that address?

You are right, but my proposal also indicates that I want to calculate separate bounce scores of a single email. In my proposal it is mentioned that if a single email has been subscribed to n mailing-lists then I want to calculate individual n bounce scores so I need the information of an email being subscribed to what lists.

Abhilash Raj

6:03 p.m.

On Fri, Jun 14, 2019, at 1:29 PM, Aaryan Bhagat wrote:

...

...
Why? Your proposal indicates you understand that a bounce is associated with a particular address and list and should not affect that address on other lists. So why are you interested in all the lists associated with that address?

You are right, but my proposal also indicates that I want to calculate separate bounce scores of a single email. In my proposal it is mentioned that if a single email has been subscribed to n mailing-lists then I want to calculate individual n bounce scores so I need the information of an email being subscribed to what lists.

Mark would know more about this, but I wonder if there is a need to keep the bounce score separate for each MailingList and keep the association with a Member Object, as compared to an Address object?

Does the formula to count bounce score take into account a MailingList's property that could have resulted in a bounce on one list but not on other list?

...

Mailman-Developers mailing list -- mailman-developers@python.org To unsubscribe send an email to mailman-developers-leave@python.org https://mail.python.org/mailman3/lists/mailman-developers.python.org/ Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9

-- thanks, Abhilash Raj (maxking)

Aaryan Bhagat

6:09 p.m.

...

Mark would know more about this, but I wonder if there is a need to keep the bounce score separate for each MailingList and keep the association with a Member Object, as compared to an Address object

Keeping it as separate has definitely its own perks, this is the basis for my proposal and I have gone through with this extensively in with Stephen in an earlier thread during GSoC selection phase, I have mentioned the details in my proposal regarding this.

...

Does the formula to count bounce score take into account a MailingList's property that could have resulted in a bounce on one list but not on other list?

Well, I have taken them indirectly into account using the MODIFIED_THRESHOLD attribute (reasons explained with examples in the proposal), it would certainly help in more efficient ruling out the members to disable the subscription.

Abhilash Raj

6:28 p.m.

On Fri, Jun 14, 2019, at 2:24 PM, Aaryan Bhagat wrote:

...

...
Mark would know more about this, but I wonder if there is a need to keep the bounce score separate for each MailingList and keep the association with a Member Object, as compared to an Address object

Keeping it as separate has definitely its own perks, this is the basis for my proposal and I have gone through with this extensively in with Stephen in an earlier thread during GSoC selection phase, I have mentioned the details in my proposal regarding this.

Can you point me to the thread with the discussions?

...

...
Does the formula to count bounce score take into account a MailingList's property that could have resulted in a bounce on one list but not on other list?

Well, I have taken them indirectly into account using the MODIFIED_THRESHOLD attribute (reasons explained with examples in the proposal), it would certainly help in more efficient ruling out the members to disable the subscription.

Mailman-Developers mailing list -- mailman-developers@python.org To unsubscribe send an email to mailman-developers-leave@python.org https://mail.python.org/mailman3/lists/mailman-developers.python.org/ Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9

-- thanks, Abhilash Raj (maxking)

Mark Sapiro

16 Jun 16 Jun

7:09 p.m.

On 6/14/19 11:09 AM, Aaryan Bhagat wrote:

...

...
Mark would know more about this, but I wonder if there is a need to keep the bounce score separate for each MailingList and keep the association with a Member Object, as compared to an Address object

Keeping it as separate has definitely its own perks, this is the basis for my proposal and I have gone through with this extensively in with Stephen in an earlier thread during GSoC selection phase, I have mentioned the details in my proposal regarding this.

...
Does the formula to count bounce score take into account a MailingList's property that could have resulted in a bounce on one list but not on other list?

Well, I have taken them indirectly into account using the MODIFIED_THRESHOLD attribute (reasons explained with examples in the proposal), it would certainly help in more efficient ruling out the members to disable the subscription.

Disclaimer:

I am NOT a GSOC mentor, nor do I wish to be.

I have not read the proposal

https://docs.google.com/document/d/1Pv-EuIwrhKM-_inf2HMHVgVzRKjEUwRl-_flMams... in full detail.

However, I have been a Mailman developer for 15 years and am commenting from that perspective.

It seems to me that some things in your proposal are too complex.

Mailman 2.1 bounce processing was simple. There is no concept of a user. There are only email addresses subscribed to lists and there is no connection between an email address subscribed to one list and the same address subscribed to another list.

MM 2.1 lists each have their own bounce processing parameter settings and these exist in Mailman 3 as well. This is appropriate as things like bounce_info_stale_after and bounce_score_threshold depend on the frequency of list posts.

Even though Mailman 3 has a concept of a user and understands things like an address may be receiving mail from more than one list but this address belongs to the user who is subscribed to these lists, I think this connection is irrelevant for bounce processing. I think the only relevant thing is that an email address on a list is bouncing, and that should affect delivery and the ultimate disabling thereof from only that list.

As you note, bounces can occur for various reasons. E.g., this address doesn't exist, this address has a full mailbox, this email appears to be spam and so on. The issue here is you don't know which of those reasons is the reason for a specific bounce. Bounces are received and processed by flufl.bounce and the only information you get is whether flufl.bounce considers it to be a temporary or permanent failure (generally 4xx or 5xx SMTP status). It doesn't distinguish between a 5xx for non-existent address and a 5xx for unacceptable message content.

The problem of what you refer to as “Bad mailing-list” or addresses bouncing because of the content of some posts is addressed in MM 2.1 by bounce probes. See https://mail.python.org/archives/list/mailman-developers@python.org/message/... for some description of that.

In any case, I think simply keeping track of bounces by list and address and taking action on that based on the lists settings is the appropriate thing to do.

If there are things I'm missing, perhaps you can give me specific pointers to places where they have been described or discussed that I may have missed.

-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Aaryan Bhagat

11:25 p.m.

...

However, I have been a Mailman developer for 15 years and am commenting from that perspective. I fully understand and respect that and in no state to question that ever.

...

Even though Mailman 3 has a concept of a user and understands things like an address may be receiving mail from more than one list but this address belongs to the user who is subscribed to these lists, I think this connection is irrelevant for bounce processing. I think the only relevant thing is that an email address on a list is bouncing, and that should affect delivery and the ultimate disabling thereof from only that list.

Yes I agree, but I wanted to make it more dynamic like if an email is subscribed to 6 mailing lists and it is bounce threshold is crossed of 5 lists and not from the last one cause it is not that active, there is a high chance of that email being problematic here, so at least lower the bounce_threshold of that mailing list for that email specifically ( done by MODIFIED_THRESHOLD ). This will update the roster of members of mailing lists more dynamically and more efficiently.

...

As you note, bounces can occur for various reasons. E.g., this address doesn't exist, this address has a full mailbox, this email appears to be spam and so on. The issue here is you don't know which of those reasons is the reason for a specific bounce. Bounces are received and processed by flufl.bounce and the only information you get is whether flufl.bounce considers it to be a temporary or permanent failure (generally 4xx or 5xx SMTP status). It doesn't distinguish between a 5xx for non-existent address and a 5xx for unacceptable message content. Yes, that is the current situation now.

...

In any case, I think simply keeping track of bounces by list and address and taking action on that based on the lists settings is the appropriate thing to do.

I wanted to make this method more robust so that users should not be subscribed even if their email is working fine, but if you say this, you say it by experience and I acknowledge that. I also mentioned my approach and implementation several times during GSoC selection phase.

So, If the old approach as Mailman2 should be adopted I should follow that, if my modification according to the proposal looks fine, then I should continue it, I have no problem and will do whatever the community thinks is the best for the community.

Mark Sapiro

17 Jun 17 Jun

2:16 a.m.

On 6/16/19 4:25 PM, Aaryan Bhagat wrote:

...

I wanted to make this method more robust so that users should not be subscribed even if their email is working fine, but if you say this, you say it by experience and I acknowledge that. I also mentioned my approach and implementation several times during GSoC selection phase.

And I didn't follow everything you posted, so I'm coming in late here. It would have been better if I had been more involved at the time, but I wasn't.

...

So, If the old approach as Mailman2 should be adopted I should follow that, if my modification according to the proposal looks fine, then I should continue it, I have no problem and will do whatever the community thinks is the best for the community.

I think your approach is probably valid, but it adds complexity to the process. Complexity is not necessarily bad, but unnecessary complexity is bad because it makes things more fragile and bug prone and more difficult to maintain.

So the bottom line question is whether this additional complexity is worth it. I am not convinced that it is. I am not saying absolutely that it isn't. I can't say that for sure without understanding the benefits it provides, but at this point, I'm skeptical.

-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Aaryan Bhagat

4:13 a.m.

...

I think your approach is probably valid, but it adds complexity to the process. Complexity is not necessarily bad, but unnecessary complexity is bad because it makes things more fragile and bug prone and more difficult to maintain.

So the bottom line question is whether this additional complexity is worth it. I am not convinced that it is. I am not saying absolutely that it isn't. I can't say that for sure without understanding the benefits it provides, but at this point, I'm skeptical.

Yes, I understand and maybe I too ignored this part due to inexperience in large scale development. But can community speed things up and give an absolute final on the type of implementation, I am sorry for being so straight but my evaluations are close and I want to make through my GSoC.

Abhilash Raj

18 Jun 18 Jun

12:39 p.m.

On Sun, Jun 16, 2019, at 9:20 PM, Aaryan Bhagat wrote:

...

...
I think your approach is probably valid, but it adds complexity to the process. Complexity is not necessarily bad, but unnecessary complexity is bad because it makes things more fragile and bug prone and more difficult to maintain.

So the bottom line question is whether this additional complexity is worth it. I am not convinced that it is. I am not saying absolutely that it isn't. I can't say that for sure without understanding the benefits it provides, but at this point, I'm skeptical.

Yes, I understand and maybe I too ignored this part due to inexperience in large scale development. But can community speed things up and give an absolute final on the type of implementation, I am sorry for being so straight but my evaluations are close and I want to make through my GSoC.

Looking at the suggestion from Mark, how about you continue working on the simplified implementation for now., and then once we have completed that and we still have time left in your GSoC period, we can talk about the enhanced proposal?

...

Mailman-Developers mailing list -- mailman-developers@python.org To unsubscribe send an email to mailman-developers-leave@python.org https://mail.python.org/mailman3/lists/mailman-developers.python.org/ Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9

-- thanks, Abhilash Raj (maxking)

Aaryan Bhagat

6:06 p.m.

...

Looking at the suggestion from Mark, how about you continue working on the simplified implementation for now., and then once we have completed that and we still have time left in your GSoC period, we can talk about the enhanced proposal?

Ok, I understand.

...

I think the only relevant thing is that an email address on a list is bouncing, and that should affect delivery and the ultimate disabling thereof from only that list.

So according to Mark's idea, we should create attributes in the mailing list model and the attributes like bounce score in the member model as it will just evaluate the bounce score for that mailing list and disable the email from that list only. I will leave the MODIFIED_THRESHOLD part as of now.

Mark Sapiro

8:03 p.m.

On 6/18/19 11:06 AM, Aaryan Bhagat wrote:

...

So according to Mark's idea, we should create attributes in the mailing list model and the attributes like bounce score in the member model as it will just evaluate the bounce score for that mailing list and disable the email from that list only. I will leave the MODIFIED_THRESHOLD part as of now.

As far as I can tell, all the relevant list attributes are already defined in mailman/model/mailinglist.py. As noted there, they should probably be added to mailman/interfaces/mailinglist.py.

I'm not sure about creating the attributes like bounce score in the member model. Granted this has the advantage of there already being separate, per list member records, but my concern is that it's an address that is bouncing, not a member so it may be more appropriate to keep bounce info with the address rather than the member.

-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Aaryan Bhagat

19 Jun 19 Jun

5:56 a.m.

...

As far as I can tell, all the relevant list attributes are already defined in mailman/model/mailinglist.py. As noted there, they should probably be added to mailman/interfaces/mailinglist.py.

Yes, and I have done that (WIP actually) in the latest pr.

...

I'm not sure about creating the attributes like bounce score in the member model. Granted this has the advantage of there already being separate, per list member records, but my concern is that it's an address that is bouncing, not a member so it may be more appropriate to keep bounce info with the address rather than the member.

Ok, so from what I get I will explain by taking an example address subscribed to 2 mailing list.

Create a bounce_score attribute in the address model.
Bounces generate from both the lists will add up the bounce_score attribute.
If bounce_score >= bounce_score_threshold of list1 disable membership of list 1.
If bounce_score>=bounce_score_threshold of list2 disable membership of list2.

Is the above inference correct? Or, am I missing something inc context here?

Richard Damon

10:34 a.m.

New subject: [SPAM?] Re: Bounce processing should be done to only USER entities?

On 6/19/19 1:56 AM, Aaryan Bhagat wrote:

...

...
As far as I can tell, all the relevant list attributes are already defined in mailman/model/mailinglist.py. As noted there, they should probably be added to mailman/interfaces/mailinglist.py. Yes, and I have done that (WIP actually) in the latest pr.

...
I'm not sure about creating the attributes like bounce score in the member model. Granted this has the advantage of there already being separate, per list member records, but my concern is that it's an address that is bouncing, not a member so it may be more appropriate to keep bounce info with the address rather than the member. Ok, so from what I get I will explain by taking an example address subscribed to 2 mailing list.

Create a bounce_score attribute in the address model.

Bounces generate from both the lists will add up the bounce_score attribute.

If bounce_score >= bounce_score_threshold of list1 disable membership of list 1.

If bounce_score>=bounce_score_threshold of list2 disable membership of list2.

Is the above inference correct? Or, am I missing something inc context here?

The example I gave earlier would give that rule a problem, to describe again:

List1: Gets 1 message a month, Reset time of 45 day without a bounce, Trigger threshold = 2 bounces

List2: Gets many messages a day, Reset time of 1 day without a bounce. Trigger threshold = 4 bounces to handle occational bounces due to spam false alarms.

First question: different attribute of the bounce reset period, which is used to reset the bounce score?

A user subscribes to both lists, because of lists 2 occasional spam false alarms, which required the elevated threshold (or even just wanting to give a few days to clear a mailbox full error) the user gets unsubscribed from list1 to rapidly. If list1 raises its threshold to handle that, a subscribe to just list 1 will stay subscribed to the list too long after bouncing.

-- Richard Damon

Aaryan Bhagat

10:52 a.m.

New subject: [SPAM?] Re: Bounce processing should be done to only USER entities?

Yes, definitely you are right on this, but since my mentor (Abbhilahs) focused on the easy implementation first and Mark also focused on the complexity being increased along with the bugs in the code. I am going for the easier implementation as of now. Reason being I am not just contributing to this organization, this is part of my GSoC, I have to deliver some concrete deliverables and not let only the discussion take all of the time before evaluations.

Again as said by Abhilash :
how about you continue working on the simplified implementation for now., and then once we have completed that and we still have time left in your GSoC period, we can talk about the enhanced proposal?

Maybe, later on, we change the design.

Mark Sapiro

2:09 p.m.

On 6/18/19 10:56 PM, Aaryan Bhagat wrote:

...

Ok, so from what I get I will explain by taking an example address subscribed to 2 mailing list.

Create a bounce_score attribute in the address model.

Bounces generate from both the lists will add up the bounce_score attribute.

If bounce_score >= bounce_score_threshold of list1 disable membership of list 1.

If bounce_score>=bounce_score_threshold of list2 disable membership of list2.

Is the above inference correct?

I would do this differently.

The bounce_info attribute of the address is a possibly empty list of tuples. Each tuple contains things like the list_id, the current score and last_bounce_received. In MM 2.1 the bounce_info for a list also contained the remaining number of notices and the time of the last notice sent to control the sending of notices and eventual removal of an address with delivery disabled by bounce.

This way, you keep track of bounces by list and ultimately may disable delivery to the address from only the one list on which the score reaches threshold.

-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Aaryan Bhagat

2:42 p.m.

Ok, understood, currently, this is the pr for which the discussion is happening, I will make the necessary changes.

Thanks, Cheers!

Abhilash Raj

4:11 p.m.

On Wed, Jun 19, 2019, at 7:09 AM, Mark Sapiro wrote:

...

On 6/18/19 10:56 PM, Aaryan Bhagat wrote:

...
Ok, so from what I get I will explain by taking an example address subscribed to 2 mailing list.

Create a bounce_score attribute in the address model.

Bounces generate from both the lists will add up the bounce_score attribute.

If bounce_score >= bounce_score_threshold of list1 disable membership of list 1.

If bounce_score>=bounce_score_threshold of list2 disable membership of list2.

Is the above inference correct?

I would do this differently.

The bounce_info attribute of the address is a possibly empty list of tuples. Each tuple contains things like the list_id, the current score and last_bounce_received. In MM 2.1 the bounce_info for a list also contained the remaining number of notices and the time of the last notice sent to control the sending of notices and eventual removal of an address with delivery disabled by bounce.

Storing list of tuples in database might be inefficient for reads. I don't think there is an accurate column type one could use for this purpose unless we convert them to string back and forth.

I suggested going with Member because it comes close to what we require, although memberships can include Users along with addresses which may not be great.

It looks like now there is a need for a BounceInfo model then which could be used to store the information instead of a list of tuples. It could keep the following attributes:

Address (relationship -> Address)
MailingList (relationship -> MailingList)
last_bounce (datetime)
last_notice (datetime)
remaining_notices (int)
total_notices_sent (int)

I am not sure if the bounce score of an address is calculated per-address or is global regardless of a MailingList in Mailman 2.

If it is global, then the score could be stored in Address table, otherwise BounceInfo table.

...

This way, you keep track of bounces by list and ultimately may disable delivery to the address from only the one list on which the score reaches threshold.

-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mailman-Developers mailing list -- mailman-developers@python.org To unsubscribe send an email to mailman-developers-leave@python.org https://mail.python.org/mailman3/lists/mailman-developers.python.org/ Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9

-- thanks, Abhilash Raj (maxking)

Mark Sapiro

4:28 p.m.

On 6/19/19 9:11 AM, Abhilash Raj wrote:

...

Storing list of tuples in database might be inefficient for reads. I don't think there is an accurate column type one could use for this purpose unless we convert them to string back and forth.

If we use PickleType, SQLAlchemy does the (un)pickling for us.

...

I suggested going with Member because it comes close to what we require, although memberships can include Users along with addresses which may not be great.

It looks like now there is a need for a BounceInfo model then which could be used to store the information instead of a list of tuples. It could keep the following attributes:

Address (relationship -> Address)

MailingList (relationship -> MailingList)

last_bounce (datetime)

last_notice (datetime)

remaining_notices (int)

total_notices_sent (int)

I am not sure if the bounce score of an address is calculated per-address or is global regardless of a MailingList in Mailman 2.

If it is global, then the score could be stored in Address table, otherwise BounceInfo table.

My idea is scores should be local, i.e. per list. In MM 2.1 there is no such thing as global bounce information/score. Everything is per list and there is no connection between a member of one list and a member of another even if the addresses are the same.

-- Mark Sapiro mark@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Abhilash Raj

4:35 p.m.

On Wed, Jun 19, 2019, at 9:28 AM, Mark Sapiro wrote:

...

On 6/19/19 9:11 AM, Abhilash Raj wrote:

...
Storing list of tuples in database might be inefficient for reads. I don't think there is an accurate column type one could use for this purpose unless we convert them to string back and forth.

If we use PickleType, SQLAlchemy does the (un)pickling for us.

I will read more on PickeType, but I feel like it might not be a great idea to store binary blobs in database. Querying them might get too expensive as SQLAlcmey might have to load all rows in memory, de-serialize them and then do the comparison. But that's my guess, I'll read up.

...

...
I suggested going with Member because it comes close to what we require, although memberships can include Users along with addresses which may not be great.

It looks like now there is a need for a BounceInfo model then which could be used to store the information instead of a list of tuples. It could keep the following attributes:

Address (relationship -> Address)

MailingList (relationship -> MailingList)

last_bounce (datetime)

last_notice (datetime)

remaining_notices (int)

total_notices_sent (int)

I am not sure if the bounce score of an address is calculated per-address or is global regardless of a MailingList in Mailman 2.

If it is global, then the score could be stored in Address table, otherwise BounceInfo table.

My idea is scores should be local, i.e. per list. In MM 2.1 there is no such thing as global bounce information/score. Everything is per list and there is no connection between a member of one list and a member of another even if the addresses are the same.

That makes sense, thanks for explanation!

-- thanks, Abhilash Raj (maxking)

Jim Ziobro

20 Jun 20 Jun

5:27 a.m.

It sounds like this is a proposal to somehow use information from one list to affect the behavior of another list. If the two lists are operating in different security/administrative domains then it means information is leaking from one domain into another. I can see some interesting behavior possible by sharing information. What is the goal?

For example joe@af.mil is subscribed to two lists: moscow-soccer-scores@kremlin.ru monthly-nuclear-launchcode@whitehouse.gov

At some point the joe's postmaster forbids non-work-related emails so moscow-soccer-scores gets bounced. In an ideal case what should happen to joe's subscription to monthly-nuclear-launchcode?

Ciao,

//Z\\

Richard Damon

11:25 a.m.

On 6/20/19 1:27 AM, Jim Ziobro wrote:

...

It sounds like this is a proposal to somehow use information from one list to affect the behavior of another list. If the two lists are operating in different security/administrative domains then it means information is leaking from one domain into another. I can see some interesting behavior possible by sharing information. What is the goal?

For example joe@af.mil is subscribed to two lists: moscow-soccer-scores@kremlin.ru monthly-nuclear-launchcode@whitehouse.gov

At some point the joe's postmaster forbids non-work-related emails so moscow-soccer-scores gets bounced. In an ideal case what should happen to joe's subscription to monthly-nuclear-launchcode?

Ciao,

I don't think anyone presumes that the subscription/bounce information will be transferred between different instances of Mailman, but is an attempt to better use information fro one mailing list about deliverability in another list run by the same instance of Mailman.

Your question does bring up an interesting point (maybe for different domains than you used) about how much information SHOULD be exchanged between lists that just happen to share the same host, perhaps a host that is providing as a commercial enterprise to many customers who operate lists.

One very useful thing is to be able to look at the bounces to see what the problem is, and if Mailman is going to disable/unsubscribe someone from a list I am running, due to a bounce from another mailing list, I would like to be able to see that bounce, but I very well would not want someone else running a completely different list that just happens to be on the same host, to see bounces from my subscribers.

This says that the 'global' sharing across a server of bounce information needs to be purely optional, or Mailman would not be suitable for shared servers. Even having the same domain name isn't good enough, as I could easily want to run a mailing list hosting service, similar to things like ConstantContact, where different customers shouldn't have access to other customers information.

This does bring up an interesting question on the structure of Mailman 3 itself. It seems that this implies that a subscriber to multiple mailing lists gets leaked the fact that two mailing list, even though they may have nothing naturally in common, are hosted on the same installation of mailman, something the list managers might not even be aware of. Even Mailman 2 could leak this information if you look at the mail headers, or carefully at the domains the lists interfaces fall back to, but this becomes much more in your face, you go to subscribe to a list and find you already have an 'user account' on that machine.

-- Richard Damon

Abhilash Raj

3:58 p.m.

On Thu, Jun 20, 2019, at 4:26 AM, Richard Damon wrote:

...

On 6/20/19 1:27 AM, Jim Ziobro wrote:

...
It sounds like this is a proposal to somehow use information from one list to affect the behavior of another list. If the two lists are operating in different security/administrative domains then it means information is leaking from one domain into another. I can see some interesting behavior possible by sharing information. What is the goal?

For example joe@af.mil is subscribed to two lists: moscow-soccer-scores@kremlin.ru monthly-nuclear-launchcode@whitehouse.gov

At some point the joe's postmaster forbids non-work-related emails so moscow-soccer-scores gets bounced. In an ideal case what should happen to joe's subscription to monthly-nuclear-launchcode?

Ciao,

I don't think anyone presumes that the subscription/bounce information will be transferred between different instances of Mailman, but is an attempt to better use information fro one mailing list about deliverability in another list run by the same instance of Mailman.

Your question does bring up an interesting point (maybe for different domains than you used) about how much information SHOULD be exchanged between lists that just happen to share the same host, perhaps a host that is providing as a commercial enterprise to many customers who operate lists.

One very useful thing is to be able to look at the bounces to see what the problem is, and if Mailman is going to disable/unsubscribe someone from a list I am running, due to a bounce from another mailing list, I would like to be able to see that bounce, but I very well would not want someone else running a completely different list that just happens to be on the same host, to see bounces from my subscribers.

Unless they are on the same Mailman instance, there is really no way to actually communicate this information. We are still debating if bounces from one list should affect other lists and you seem to have a good point that it shouldn't.

...

This says that the 'global' sharing across a server of bounce information needs to be purely optional, or Mailman would not be suitable for shared servers. Even having the same domain name isn't good enough, as I could easily want to run a mailing list hosting service, similar to things like ConstantContact, where different customers shouldn't have access to other customers information.

This does bring up an interesting question on the structure of Mailman 3 itself. It seems that this implies that a subscriber to multiple mailing lists gets leaked the fact that two mailing list, even though they may have nothing naturally in common, are hosted on the same installation of mailman, something the list managers might not even be aware of. Even Mailman 2 could leak this information if you look at the mail headers, or carefully at the domains the lists interfaces fall back to, but this becomes much more in your face, you go to subscribe to a list and find you already have an 'user account' on that machine.

Doing a reverse lookup of the web_host attribute in a domain should make that search easier. And yes, Mailman IMO wasn't designed to serve two entities who would want complete secrecy from one another but would still opt to use a shared Mailman server.

-- thanks, Abhilash Raj (maxking)

Aaryan Bhagat

7:05 p.m.

This is a really thoughtful insight, while I was thinking about implementing this feature of obtaining bounce data from other mailing lists, I did not realize that shared servers can also be there and then much hassle will be in obtaining data, so you can only take data from the same installation which is information leakage about groups of mailing lists being in the same installation. I understand your point. Thanks

Richard Damon

17 Jun 17 Jun

2:53 a.m.

New subject: [SPAM?] Re: Bounce processing should be done to only USER entities?

On 6/16/19 7:25 PM, Aaryan Bhagat wrote:

...

...
However, I have been a Mailman developer for 15 years and am commenting from that perspective. I fully understand and respect that and in no state to question that ever.

...
Even though Mailman 3 has a concept of a user and understands things like an address may be receiving mail from more than one list but this address belongs to the user who is subscribed to these lists, I think this connection is irrelevant for bounce processing. I think the only relevant thing is that an email address on a list is bouncing, and that should affect delivery and the ultimate disabling thereof from only that list. Yes I agree, but I wanted to make it more dynamic like if an email is subscribed to 6 mailing lists and it is bounce threshold is crossed of 5 lists and not from the last one cause it is not that active, there is a high chance of that email being problematic here, so at least lower the bounce_threshold of that mailing list for that email specifically ( done by MODIFIED_THRESHOLD ). This will update the roster of members of mailing lists more dynamically and more efficiently.

...
As you note, bounces can occur for various reasons. E.g., this address doesn't exist, this address has a full mailbox, this email appears to be spam and so on. The issue here is you don't know which of those reasons is the reason for a specific bounce. Bounces are received and processed by flufl.bounce and the only information you get is whether flufl.bounce considers it to be a temporary or permanent failure (generally 4xx or 5xx SMTP status). It doesn't distinguish between a 5xx for non-existent address and a 5xx for unacceptable message content. Yes, that is the current situation now.

...
In any case, I think simply keeping track of bounces by list and address and taking action on that based on the lists settings is the appropriate thing to do. I wanted to make this method more robust so that users should not be subscribed even if their email is working fine, but if you say this, you say it by experience and I acknowledge that. I also mentioned my approach and implementation several times during GSoC selection phase.

So, If the old approach as Mailman2 should be adopted I should follow that, if my modification according to the proposal looks fine, then I should continue it, I have no problem and will do whatever the community thinks is the best for the community.

I think it is a very tough problem, and needs a lot of thought. While it is a great goal to say that if we can detect that an email address has stopped working via one list, then another list, that might be less active, could possibly benefit by inheriting the information. The big problem is that because bounce detection is really an inexact measurement, the detection of really problem email addresses is tough.

Take a couple of lists I run as an example:

List 1 is a monthly newsletter. An email is sent early in the month, every month, and no other traffic. If we want to allow subscribers to bounce once and only on the second bounce (because it was just a transitory issue) you need a second bounce, then at least for email from this list, you need 2 bounces (and likely don't want more) and need something like 45 days between bounces to reset the count.

List 2 is much more active, many messages a day, and some ISP will occasionally bounce a message with a spam false alarm. If we used the same settings as above, many people would get disabled for the false alarm spam. You can change the reset parameter to reset if you get just one day without a bounce, as that meant messages did get through, and you likely want the threshold higher, something like 4-7 bounces to give people a chance to clear issues, or just not get hit when you get a couple of false alarms in a row.

Unless Mailman could somehow keep track of successfully sent messages, and be able to use that in the rules, it is hard to see how to deal with both lists with a common counter. And keeping track of success can be hard, as some systems will accept the message, and later return the rejection message (this does have back-scatter issues, but it is done), and the delay can be significant (I've seen rejects days later for some system issues).

-- Richard Damon

Aaryan Bhagat

4:21 a.m.

New subject: [SPAM?] Re: Bounce processing should be done to only USER entities?

...

List 1 is a monthly newsletter. An email is sent early in the month, every month, and no other traffic. If we want to allow subscribers to bounce once and only on the second bounce (because it was just a transitory issue) you need a second bounce, then at least for email from this list, you need 2 bounces (and likely don't want more) and need something like 45 days between bounces to reset the count.

List 2 is much more active, many messages a day, and some ISP will occasionally bounce a message with a spam false alarm. If we used the same settings as above, many people would get disabled for the false alarm spam. You can change the reset parameter to reset if you get just one day without a bounce, as that meant messages did get through, and you likely want the threshold higher, something like 4-7 bounces to give people a chance to clear issues, or just not get hit when you get a couple of false alarms in a row.

We can tackle this by keeping different attributes for each mailing list object ( which is there in my proposal ). The problem arises when an email is subscribed to both. Here we can keep the bounce score different ( mailing list email pair ) and disable only the subscription to list 2.

...

Unless Mailman could somehow keep track of successfully sent messages, and be able to use that in the rules, it is hard to see how to deal with both lists with a common counter. And keeping track of success can be hard, as some systems will accept the message, and later return the rejection message (this does have back-scatter issues, but it is done), and the delay can be significant (I've seen rejects days later for some system issues).

Not necessarily I suppose, keeping in track of bounces in (mailing list, email ) pair will do mostly do. But then again what Mark points out regarding complexity and bugs is definitely a point to consider.

Richard Damon

14 Jun 14 Jun

6:30 p.m.

On 6/14/19 2:03 PM, Abhilash Raj wrote:

...

...
...
Why? Your proposal indicates you understand that a bounce is associated with a particular address and list and should not affect that address on other lists. So why are you interested in all the lists associated with that address? You are right, but my proposal also indicates that I want to calculate separate bounce scores of a single email. In my proposal it is mentioned that if a single email has been subscribed to n mailing-lists then I want to calculate individual n bounce scores so I need the information of an email being subscribed to what lists. Mark would know more about this, but I wonder if there is a need to keep

On Fri, Jun 14, 2019, at 1:29 PM, Aaryan Bhagat wrote: the bounce score separate for each MailingList and keep the association with a Member Object, as compared to an Address object?

Does the formula to count bounce score take into account a MailingList's property that could have resulted in a bounce on one list but not on other list? I could see definite needs for different types of list to use different scoring settings. One big case would be a list that doesn't send out mail every day may well need a longer period between bounces to allow it to accumulate enough of a score to disable the address, while a very busy list likely wants a very short period so very occasional bounce backs (like false positive spam rejects) don't get a chance to accumulate.

-- Richard Damon

1771

Age (days ago)

1777

Last active (days ago)

List overview

Download

28 comments

5 participants

participants (5)

Aaryan Bhagat
Abhilash Raj
Jim Ziobro
Mark Sapiro
Richard Damon

Bounce processing should be done to only USER entities?

tags

participants (5)