Implementing `bounce_you_are_disabled_warnings_interval` in `Send_Warnings` function.
I am developing the processing of bounce messages as my GSoC project.
Currently the Bounce Functions
are being developed in this pr on GitLab.
There is one bool
attribute called bounce_you_are_disabled_warnings_interval
in the Mailing List
model.
It means
The number of days between each disabled notification.
Implementing this has a problem
- Say in the case of
process_bounces
function which processes theBounceEvents
function it is easy it just takes one by one the events from the database and processes them. - In the case of
Send_Warnings
function if the same approach was followed, meaning it would take one by one theAddress
instances, check tuples in thebounce_info
attribute ( see the pr for this ) and see whether to send a warning mail or not to can be a slow method. - Let's suppose the
Addresses
list is very long, then anAddress
instance in the very bottom of the list whose subscription has been disabled and some warning emails send, now waits to receive another mail as the interval is more thanbounce_you_are_disabled_warnings_interval
. - If the function is enumerating from the top then in order to take action first it has to reach this
Address
instance but the interval is already crossed. This can cause slow performance as thewarning mail
will be sent way late than it actually should have been sent. - Also if I am implementing 2 functions
process_bounces
andsend_warnings
what if both of them attempted at the sameAddress
instance? - Basically
implementation on sending warning mails
andimplementation to increase bounce_score
are separate things and they can cause problems if they processed the same instance.
Pointers on above will be helpful. Am I missing something above?
On 7/26/19 2:52 AM, Aaryan Bhagat wrote:
I am developing the processing of bounce messages as my GSoC project.
Currently theBounce Functions
are being developed in this pr on GitLab.
There is onebool
attribute calledbounce_you_are_disabled_warnings_interval
in theMailing List
model.
It's actually defined as Column(Interval) in your MR which what it should be.
It means
The number of days between each disabled notification.
Implementing this has a problem
- Say in the case of
process_bounces
function which processes theBounceEvents
function it is easy it just takes one by one the events from the database and processes them.- In the case of
Send_Warnings
function if the same approach was followed, meaning it would take one by one theAddress
instances, check tuples in thebounce_info
attribute ( see the pr for this ) and see whether to send a warning mail or not to can be a slow method.- Let's suppose the
Addresses
list is very long, then anAddress
instance in the very bottom of the list whose subscription has been disabled and some warning emails send, now waits to receive another mail as the interval is more thanbounce_you_are_disabled_warnings_interval
.- If the function is enumerating from the top then in order to take action first it has to reach this
Address
instance but the interval is already crossed. This can cause slow performance as thewarning mail
will be sent way late than it actually should have been sent.- Also if I am implementing 2 functions
process_bounces
andsend_warnings
what if both of them attempted at the sameAddress
instance?- Basically
implementation on sending warning mails
andimplementation to increase bounce_score
are separate things and they can cause problems if they processed the same instance.Pointers on above will be helpful. Am I missing something above?
You may consider doing this as it's done in MM 2.1. There, once a list's delivery is disabled by bounce for an address and the first notice sent, Mailman's processing has nothing further to do with it. There is a daily cron which is responsible for sending notices and ultimately removing disabled users.
A partially migrated version of this is at <https://gitlab.com/mailman/mailman/blob/master/port_me/disabled.py>
You might consider implementing this as a 'mailman' subcommand to be run daily by cron to do this task.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Aaryan Bhagat writes:
In the case of
Send_Warnings
function if the same approach was followed, meaning it would take one by one theAddress
instances, check tuples in thebounce_info
attribute ( see the pr for this ) and see whether to send a warning mail or not to can be a slow method.Let's suppose the
Addresses
list is very long, then anAddress
instance in the very bottom of the list whose subscription has been disabled and some warning emails send, now waits to receive another mail as the interval is more thanbounce_you_are_disabled_warnings_interval
.If the function is enumerating from the top then in order to take action first it has to reach this
Address
instance but the interval is already crossed. This can cause slow performance as thewarning mail
will be sent way late than it actually should have been sent.
I assume that what you mean here is that this process is executed perhaps once per day and probably takes a few minutes at most in processing. Then we have 300s/86400s = 1/288 of a day = probability that during the interval that address's timer expires, and the message is delayed until the process triggers again, hypothetically a full day.
The odds that this happens is quite small unless you have a humongous list. Furthermore, the probability that the mail will go through this time if the account has disabled due to bouncing is presumably way less than 1. So my first-order response is this is a non-problem.
Note that you have basically the same problem if the Address's timer expires a few seconds after the send_warnings process finishes.
My second-order response is if you still care given that this is
rarely going to happen, and even more rarely will it generate a
message that reaches the user who reactivates the subscription, you
can arrange to have an index on the Addresses (or a separate queue)
which has the earliest timer first, and then process the Addresses in
that order. You'll still have the same problem if the timer expires a
few picoseconds after you start processing the particular address. :-)
Basically the way to deal with this is to adjust the
bounce_you_are_disabled_warnings_interval
.
If you do implement the queue, another possibility comes up, which is that you process the queue continuously, with a process (a runner, I guess) that sleeps until it's time to process the next send_warning. This would have the advantage that if there were another "DMARC event" that caused large number of bounces, you wouldn't have a flood of warning messages are a particular time of day.
- Also if I am implementing 2 functions
process_bounces
andsend_warnings
what if both of them attempted at the sameAddress
instance?
What about it? This is even more unlikely, and I assume that process_bounces will have a read-write lock that prevents reading (and writing) while it's writing, while send_warnings will have a read lock that prevents writing while it's reading. I think all of the databases we support have such locks.
Stephen Writes:
The odds that this happens is quite small unless you have a humongous list. Furthermore, the probability that the mail will go through this time if the account has disabled due to bouncing is presumably way less than 1. So my first-order response is this is a non-problem.
Note that you have basically the same problem if the Address's timer expires a few seconds after the send_warnings process finishes.
I understand your point.
Stephen writes:
My second-order response is if you still care given that this is rarely going to happen, and even more rarely will it generate a message that reaches the user who reactivates the subscription, you can arrange to have an index on the Addresses (or a separate queue) which has the earliest timer first, and then process the Addresses in that order. You'll still have the same problem if the timer expires a few picoseconds after you start processing the particular address. :-) Basically the way to deal with this is to adjust the bounce_you_are_disabled_warnings_interval.
This looks fine as of now but I do not know exactly how long or how it will implement, also creating a new runner is also very resource-consuming so I think I will be stick to straightforward implementation as of now. I basically did not realize the type of organizations using Mailman
( especially if they have very large size
Mailing List or not so I presumed this problem's existence ). I suggest we can actually see the feedback of customers regarding problems arising in this area and depending upon that we can choose to modify/not-modify this. If we choose to modify it, maybe we can integrate it some of next year's GSoC project.
Stephen writes:
What about it? This is even more unlikely, and I assume that process_bounces will have a read-write lock that prevents reading (and writing) while it's writing, while send_warnings will have a read lock that prevents writing while it's reading. I think all of the databases we support have such locks.
I do not understand this part, send_warnings
has to increment the bounce_you_are_disabled_warnings
of the Address
and the process_bounce
will have to set the processed
attribute of BounceEvent
after completely processing that entry to True and save it again. Both of them have to write after reading a single entry so I do understand the reason of the lock.
Aaryan Bhagat writes:
Stephen writes:
My second-order response is if you still care [...], you can arrange to have an index on the Addresses (or a separate queue) which has the earliest timer first, and then process the Addresses in that order.
This looks fine as of now but I do not know exactly how long or how it will implement, also creating a new runner is also very resource-consuming so I think I will be stick to straightforward implementation as of now.
That is what I would recommend, too. For future reference:
I basically did not realize the type of organizations using
Mailman
(especially if they havevery large size
Mailing List or not so I presumed this problem's existence).
For Mailman 2, there are a number of organizations with lists with 100,000 members, and I think there were one or two claiming a million or more. There are also organizations with tens of thousands of lists, although I don't know what the average number of members is. I don't know of any 100,000+ scale lists using Mailman 3, but I know that there are organizations with large numbers of lists that are converting to Mailman 3.
For the particular issue regarding how long it takes to process disabled warnings for one large list, you can get a lower bound on the time by creating a few large lists (10,000, 20,000, 50,000, and 100,000 subscribers seem like a good selection) and see how long it takes to go through the whole list when *none* are disabled, and do the same with small lists (100, 200, 500, 1000) where *all* are disabled, forwarding the warnings to a local /dev/null address, presuming that as far as the simultaneity problem goes Mailman doesn't care how long the MTA takes to get the messages out, it only cares about how fast the MTA can absorb and queue them. I imagine an MTA with a full queue might be somewhat slower than one with an empty queue, but it shouldn't be very much.
I'm not suggesting that you implement this this summer, although if you have extra time for additional work, it's one idea.
I suggest we can actually see the feedback of customers regarding problems arising in this area
Unfortunately, this kind of feedback is likely to be very rare because neither the subscriber nor the admin is likely to notice that a "you are disabled" warning has been delayed by a day or even a couple of weeks. A very few will, but remember this situation is unlikely to manifest in the first place.
Stephen writes:
What about it? [...] I think all of the databases we support have [the needed] locks.
I do not understand this part,
send_warnings
has to increment thebounce_you_are_disabled_warnings
of theAddress
and theprocess_bounce
will have to set theprocessed
attribute ofBounceEvent
after completely processing that entry to True and save it again. Both of them have to write after reading a single entry so I do understand the reason of the lock.
I don't understand why there's any problem here. In what scenario is the database corrupted relative to what is desired? What variables are set differently from desired, and how? What are the user consequences of the incorrect database?
Steve
Thanks for the reply, Stephen!
Stephen writes:
I don't understand why there's any problem here. In what scenario is the database corrupted relative to what is desired? What variables are set differently from desired, and how? What are the user consequences of the incorrect database?
What I mean is if we take the example of BounceEvents
column which has all the bounce messages stored, after processing each message I have to set the processed
attribute of that message as true, so that will require modification of the database.
Also in the case where I will be sending_warnings, there is warning_count and warning_limit ( not the exact names of the attributes ) of each Address
instance with respect to each Mailing List
so I have to increase the warning_count counter and save again in the database.
This is the pr which contains my modifications of the models which you can relate to the above context I wrote.
Sorry, this got hung up in my drafts folder. The basic coding probably is solved, I think, but I'm worried we're not communicating about what the design problems are. So here goes.
Aaryan Bhagat writes:
Thanks for the reply, Stephen!
Stephen writes:
I don't understand why there's any problem here. In what scenario is the database corrupted relative to what is desired? What variables are set differently from desired, and how? What are the user consequences of the incorrect database?
What I mean is if we take the example of
BounceEvents
column which has all the bounce messages stored, after processing each message I have to set theprocessed
attribute of that message as true, so that will require modification of the database.Also in the case where I will be sending_warnings, there is warning_count and warning_limit ( not the exact names of the attributes ) of each
Address
instance with respect to eachMailing List
so I have to increase the warning_count counter and save again in the database.
I understand the details of the algorithm, that there are different attributes of certain objects that need to be modified.
I don't understand why you believe there are conditions under which Mailman will do something undesirable, such as create a very long delay from the time "something" (such as mailing a disabled warning) *should* happen until it *does* happen.
Stephen writes:
I don't understand why you believe there are conditions under which Mailman will do something undesirable, such as create a very long delay from the time "something" (such as mailing a disabled warning) should happen until it does happen.
I am sorry but I do not understand your concern. I just mentioned the cases where we actually have to make changes to the database after each processing of BounceEvents
. I do not expect Mailman
to show any undesirable behaviour here.
Aaryan Bhagat writes:
I am sorry but I do not understand your concern. I just mentioned the cases where we actually have to make changes to the database after each processing of
BounceEvents
. I do not expectMailman
to show any undesirable behaviour here.
You wrote some posts that indicated you were specifically worried about the timing of the warning mails. When the conversation continued without much explanation of your concerns, I assumed that you were still thinking about the concurrency issues that arise in an asynchronous system such as email.
Stephen writes:
You wrote some posts that indicated you were specifically worried about the timing of the warning mails. When the conversation continued without much explanation of your concerns, I assumed that you were still thinking about the concurrency issues that arise in an asynchronous system such as email.
Well, since you pointed out the rarity of this being very low in your replies, I then ignored this problem and started to ask more on the implementation. I apologize for not mentioning my transition.
So it has taken me a couple of days regarding the right implementation.
According to me, I think a new command which takes all the Address
instances and in each instances checks which tuple of bounce_info
have disabled
attribute true ( more regarding these attribute in this pr ). If it sees the disabled
attribute of a tuple to be true it will roughly:
- Send a warning
- Increase warning counter
- Check with the threshold
- Updates last_warning_sent timestamp value
The newly updated tuple ( with the updated counter and timestamp ) will be stored again. I will add it to a cron job. Is the above implementation thought correct? Am I missing
Another one that got hung up in my drafts folder. Same as the other, I think the basic coding probably is solved by now, but I'm worried we're not communicating about what the design problems are.
Aaryan Bhagat writes:
According to me, I think a new command which takes all the
Address
instances and in each instances checks which tuple ofbounce_info
havedisabled
attribute true. If it sees thedisabled
attribute of a tuple to be true it will roughly:
- Send a warning
Shouldn't you first check if the disabled_warning_interval has been exceeded, and if it has, send a warning? Or is that what you mean by "check with the threshold" below?
- Increase warning counter
Incrementing the warning counter should always happen if the mail is successfully sent.
- Check with the threshold
What "threshold"? The disabled_warning_interval?
- Updates last_warning_sent timestamp value
Updated this timestamp should always happen if the mail is successfully sent.
I don't know offhand how any of the above operations could fail, and most likely to fail is sending mail, which is handled by the virgin, outgoing, and retry queues, not by the logic you're working on here. Still, perhaps some of the operations should be conditional on success. Also, you need to think about what happens if the warning fails for some address. Probably you just treat that as equivalent to a bounce in most cases, but what happens if it's a "no such address"? Is that already handled?
Note that adding to the virgin queue probably can't fail (except in really bad circumstances such as out of memory), but simply adding the message to the retry queue may not be appropriate (I forget how often that queue gets flushed).
If that doesn't make sense given what you know about how the queues work, then don't worry about it -- you probably know more about the queues than I do at this point.
The newly updated tuple ( with the updated counter and timestamp ) will be stored again.
I will add it to a cron job.
Cron job is what Mark suggested so that seems fine.
Steve
Stephen writes:
What "threshold"? The disabled_warning_interval?
Yes, exactly. I apologise for being informal.
Stephen writes:
Incrementing the warning counter should always happen if the mail is successfully sent.
Yes, I meant that only, I apologise for not explaining correctly.
Stephen writes:
I don't know offhand how any of the above operations could fail, and most likely to fail is sending mail, which is handled by the virgin, outgoing, and retry queues, not by the logic you're working on here. Still, perhaps some of the operations should be conditional on success.
I also realize that sending mail is the critical connection
of this whole flow and most of the operations will work on certain conditionals, of course, I just showed a rough outline of the process. As for when each and individual operation will be executed depending/not-depending upon the success/failure of the previous one I am currently working on that.
Also, you need to think about what happens if the warning fails for some address. Probably you just treat that as equivalent to a bounce in most cases, but what happens if it's a "no such address"? Is that already handled?
Yes, that is correct, as of now, if bounces are received for warning_messages
then they will be treated normally as it would have been done for other emails.
For the problem, you mentioned regarding no such address
, it was something which occurred to me earlier but I had not pondered upon it so much. Let me clearly explain the problem.
- Imagine if some email address goes
wrong/nullified
for some reasons. - Bounces will originate and after sometime, it will be disabled and warnings will be sent.
- Warning emails will also be recorded as bounces and after all the number of warnings increase
bounce_you_are_disabled_warnings
no more warnings will be sent by Mailman. - But in the end, no action was taken and it will still be there in the roster and will again be called by the process responsible for sending warnings.
- For this type of behaviour, a specific action is required?
- I mean when the above (1-4) are satisfied I can remove or somehow disable it totally from the roster so even the process responsible for sending warnings will not recognize it.
What other implementation for this type of behaviour can be implemented? Some pointers are required on this.
Aaryan Bhagat writes:
Stephen writes:
Also, you need to think about what happens if the warning fails for some address. Probably you just treat that as equivalent to a bounce in most cases, but what happens if it's a "no such address"? Is that already handled?
For the problem, you mentioned regarding
no such address
, it was something which occurred to me earlier but I had not pondered upon it so much. Let me clearly explain the problem.
[snipped]
What other implementation for this type of behaviour can be implemented? Some pointers are required on this.
If we receive a bounce that indicates that the address does not exist, we should stop sending warnings there. Somebody has screwed up pretty badly, and it wasn't us.
In most cases, this deletion will be valid and intended by the user (or well-deserved for abusing the account). In those cases, the address should be unsubscribed. Questions:
Do we unsubscribe on the theory that it's almost always correct to do so, or do we keep it around and just disable without further warnings in case of resurrection and to associate mail "From" that address with the user? (This has issues in case the address is resurrected but assigned to a different person.)
If the user has other addresses, should Mailman warn them about this situation? (This question also applies to ordinary disabled warnings that bounce.)
If the user wants to migrate the subscription to a working address, can we make this simpler? (Not in scope of your GSoC, mentioned because I thought of it and want it in public.)
Should the address be deleted? It's possible that the user will use this address in From. Technically this is probably invalid (the user doesn't have the right to use that address at all if they can't receive mail there). But I don't see a good reason not to associate such mail with their Mailman user.
Should the user be allowed to post "From" that address? Probably not.
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
Stephen writes:
If we receive a bounce that indicates that the address does not exist, we should stop sending warnings there. Somebody has screwed up pretty badly, and it wasn't us.
That is the main hurdle, bounce extraction cannot determine the exact cause so we cannot ever say with surety the reason.
Stephen writes:
In most cases, this deletion will be valid and intended by the user (or well-deserved for abusing the account). In those cases, the address should be unsubscribed.
Yes, this is the expected good
behaviour which is most suitable in cases like these.
Stephen writes:
Do we unsubscribe on the theory that it's almost always correct to do so, or do we keep it around and just disable without further warnings in case of resurrection and to associate mail "From" that address with the user? (This has issues in case the address is resurrected but assigned to a different person.)
As mentioned above, since cannot know the reason so we should not rush with the actions like unsubscribing from the mlist or deleting the address. I think we should let the normal implementation run its course then make some actions. I mean:
- First,
bounce_score_threshold
will be crossed. - Then a probe will be sent if this probe bounces then
DeliveryStatus
is disabled of thatmember
. - Then warning emails will be sent, till the count reaches
bounce_you_are_disabled_warnings
. - When both of them are exhausted then we can unsubscribe the user.
- I can make a separate template telling the reason for this type of unsubscription.
- I think
Address
instances or theMember
instances should not be deleted as when the person will subscribe with the same email, data will be restored, but as you pointed out if the email is different then nothing of that sort will happen and space will be wasted. This I am not sure of andMark
,Abhilash
andStephen
please point your opinion on this. If not much is clear on this then the default is I will not delete the data.
If the user has other addresses, should Mailman warn them about
Stephen writes: this situation?
This seems to overcomplicate things, keeping the address
instances separate will be simple and no problems will arise. It can be implemented but I am not sure about its priority as of now. Default I will not implement this.
Stephen writes:
Should the user be allowed to post "From" that address? Probably not.
For a case when only DeliveryStatus
is disabled then I think the user can post. But when the case happens such that the condition which I explained above in the points are met then we will unsubscribe that user and then there won't be any problem.
Stephen writes:
Should the address be deleted? It's possible that the user will use this address in From. Technically this is probably invalid (the user doesn't have the right to use that address at all if they can't receive mail there). But I don't see a good reason not to associate such mail with their Mailman user.
As explained above, I am not sure whether the address should/shouldn't be deleted, but if the user is unsubscribed then there won't be any problem if the user uses the address in From as the mailing list will not recognize it.
If something is not explained well-enough please point out. Am I missing something here?
Sorry for the delay, family issues and $WORK stuff.
Aaryan Bhagat writes:
Am I missing something here?
No, I don't think so. *We* (Mailman) may be able to improve something here, but it's not in scope of your GSoC work as far as I can see. Your opinions (and if you choose, post-GSoC work) on these questions will be appreciated since it's in the same area you're getting a lot of experience in.
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
participants (3)
-
Aaryan Bhagat
-
Mark Sapiro
-
Stephen J. Turnbull