Extracting some Mailman code
Hey folks, This is a request for a pointer to some code rather than an offering for Mailman, but I hope my cred with the Mailman developers will fetch me a bit of help. I can read code and find this out myself, but one of the Mailman devs can probably give me some pointers that will save me a lot of time, time being the only human resource which is truly limited!
I'm running Courier-MTA, an excellent MTA around which I've built FMP's small ESP services. Courier has the ability to do a simple email redirect using an alias address in a flat file, in a special directory, containing only the email address to which email should be redirected. Courier also has the ability to interpret lines in this file starting with "|" as programs to which the body of an email can be submitted via stdin and any required processing done therein. I'm the author of courier-to-mailman.py in the contrib collection in Mailman 2's current standard code which works this way.
I'm seeing increasing problems with DMARC rejection of emails sent through this simple redirection mechanism, for obvious reasons, and I'm thinking that I might borrow code from Mailman to re-write the From address just as Mailman does when handling a list with from_is_list set to "Munge From", and then pipe emails for selected ESP clients through this filter. Said filter must:
detect whether or not the sending domain publishes a DMARC "p=reject" or "p=quarantine" record
If so, parse out the From address in the email and rewrite it in the general form Mailman uses with "on behalf of ..." giving the origianl sender and specifying the mail server's DN in the sender address.
... after which the email will be sent on to the recipients _real_ address. The management of the message body, the piping and such is handled quite well by Courier.
I assume that this would mitigate the DMARC issue for redirections through our mail server, just as it does for Mailman.
So, if someone could give me a few pointers to the relevant code in Mailman 2, and any suggestions which might save me some time, I can take it from there. I'm python-literate and have hacked our copy of Mailman here in the past - perhaps more than is wise since every time I upgrade I need to apply a number of patches to bring my mods along with the upgrade :)
Thanks for any help you can give me. If it's too much bother, tell me so and I'll put on my hacker's hat and go read code :)
Ciao,
-- Lindsay Haisley | "Behold! Our way lies through a FMP Computer Services | dark wood whence in which 512-259-1190 | weirdness may wallow!” http://www.fmp.com | --Beauregard
On 10/12/2017 12:37 PM, Lindsay Haisley wrote:
So, if someone could give me a few pointers to the relevant code in Mailman 2, and any suggestions which might save me some time, I can take it from there. I'm python-literate and have hacked our copy of Mailman here in the past - perhaps more than is wise since every time I upgrade I need to apply a number of patches to bring my mods along with the upgrade :)
There are two pieces to this in Mailman (both 2.1 and 3.1). One piece is determining the DMARC policy of the From: domain. In MM 2, the code that does this is in Mailman/Utils.py beginning with the comments
# The next functions read data from # https://publicsuffix.org/list/public_suffix_list.dat and implement the # algorithm at https://publicsuffix.org/list/ to find the "Organizational # Domain corresponding to a From: domain.
and extending through the end of the
def _DMARCProhibited(mlist, email, dmarc_domain, org=False):
function. There are recent changes for MM 2.1.25. See <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1724> or just look at <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/view/head:/Mailman/Utils.py>
This code is improved for MM 3. Most of the changes have to do with the organizational domain data from <https://publicsuffix.org/list/public_suffix_list.dat>. in MM 2.1, this is retrieved once when first needed (first post after a (re)start of Mailman) and kept in core until the next restart which could be a long time. In MM 3 the data are cached, but the cache has a lifetime after which it is reloaded.
See <https://gitlab.com/mailman/mailman/blob/master/src/mailman/rules/dmarc.py> for the MM 3 code.
The second part is the actual From: header munging. In MM 3 that's done by <https://gitlab.com/mailman/mailman/blob/master/src/mailman/handlers/dmarc.py>. In MM 2.1 it's in multiple places, but the meat is in Mailman/Handlers/CookHeaders.py. There's more to it because the actual transformations aren't done to the message until after it's been queued for the digest and the archiver, but CookHeaders.py is where the work is done.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Thu, 2017-10-12 at 13:15 -0700, Mark Sapiro wrote:
On 10/12/2017 12:37 PM, Lindsay Haisley wrote:
So, if someone could give me a few pointers to the relevant code in Mailman 2, and any suggestions which might save me some time, I can take it from there. I'm python-literate and have hacked our copy of Mailman here in the past - perhaps more than is wise since every time I upgrade I need to apply a number of patches to bring my mods along with the upgrade :)
There are two pieces to this in Mailman (both 2.1 and 3.1). One piece is determining the DMARC policy of the From: domain. In MM 2, the code that does this is in Mailman/Utils.py beginning with the comments
# The next functions read data from # https://publicsuffix.org/list/public_suffix_list.dat and implement the # algorithm at https://publicsuffix.org/list/ to find the "Organizational # Domain corresponding to a From: domain.
and extending through the end of the
def _DMARCProhibited(mlist, email, dmarc_domain, org=False):
I'm running MM 2.1.18-1 here and find only
# This takes an email address, and returns True if DMARC policy is p=reject # or possibly quarantine. def IsDMARCProhibited(mlist, email): ... etc
This looks pretty straight-forward. I can dispense with code related to mlist since I need only a True|False determination of whether the sending domain publishes a DMARC "p=reject" or "p=quarantine" record. You folks are obviously up-to-speed on DMARC nuances and this code looks pretty through.
I assume the reference to publicsuffix.org comes with later versions of 2.x and in MM 3 since there's none in 2.1.18-1. This must be something new in the DMARC mitigation world and I'm not familiar with it.
Is there any reason to pull in a more recent MM 2 and use the DMARC detection code therein? Speed is important here since this is simply a turnaround on a single email, not dependent on any list variables. I'm reluctant to burden every redirection turnaround with an HTTP look-up.
Replacement of the From header is just a matter of reading the email headers into an array, making modifications if necessary and pushing the result, followed by the message body, out to Courier's sendmail clone. Basically:
if from_domain publishes bad DMARC: if Reply-To does not exist: copy From header to Reply-To Replace From with "On behalf of old_From" <postmaster@fmp.com> Feed headers and body to Courier's sendmail clone
My take on it is that this should work OK.
--
Lindsay Haisley | "The first casualty when
FMP Computer Services | war comes is truth."
512-259-1190 |
http://www.fmp.com | -- Hiram W Johnson
On 10/15/2017 07:35 AM, Lindsay Haisley wrote:
I assume the reference to publicsuffix.org comes with later versions of 2.x and in MM 3 since there's none in 2.1.18-1. This must be something new in the DMARC mitigation world and I'm not familiar with it.
This is code that was added in 2.1.22 to deal with organizational domains. Every domain has a corresponding organizational domain which may or may not be the same as the original domain. In many cases it's simple. For example, the organizational domain for example.com is example.com and organizational domain for any.subdomain.of.example.com is example.com.
The DMARC standard says check the policy of the domain, but if the domain doesn't publish a policy, check the policy of the corresponding organizational domain, so you actually need to check the organizational domain. It's even more complicated than that because the organizational domain can publish a p= policy which applies to it and any subdomains that don't publish their own policy, but it can also publish an s= policy which applies only to subdomains that don't publish their own policy but not to itself.
The actual determination of the organizational domain for a given domain can be complex. For common tlds like .com, .org, .edu, .net and the like, the organizational domain is simply the next level, e.g. example.com, etc., but it can get much more complicated than that. For example, see the .jp section in the data at <https://publicsuffix.org/list/public_suffix_list.dat>.
Is there any reason to pull in a more recent MM 2 and use the DMARC detection code therein? Speed is important here since this is simply a turnaround on a single email, not dependent on any list variables. I'm reluctant to burden every redirection turnaround with an HTTP look-up.
I think you need to deal with organizational domains. You may be able to get away with just assuming the organizational domain is the two top levels and ignoring all those cases where it isn't, but you should at least look at either <https://gitlab.com/mailman/mailman/blob/master/src/mailman/rules/dmarc.py> or <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/view/head:/Mailman/Utils.py> for ideas.
Replacement of the From header is just a matter of reading the email headers into an array, making modifications if necessary and pushing the result, followed by the message body, out to Courier's sendmail clone. Basically:
if from_domain publishes bad DMARC: if Reply-To does not exist: copy From header to Reply-To Replace From with "On behalf of old_From" <postmaster@fmp.com>
I have seen it said that email addresses in display names in From: headers are a sign of spaminess. Thus, in the above I suggest that old_From should just be the display name part of the original From: or be munged in some way (replace '@' with ' at ' or ?) so it doesn't look like an email address.
Feed headers and body to Courier's sendmail clone
My take on it is that this should work OK.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Sun 15/Oct/2017 19:24:16 +0200 Mark Sapiro wrote:
Replacement of the From header is just a matter of reading the email headers into an array, making modifications if necessary and pushing the result, followed by the message body, out to Courier's sendmail clone. Basically:
if from_domain publishes bad DMARC: if Reply-To does not exist: copy From header to Reply-To Replace From with "On behalf of old_From" <postmaster@fmp.com>
I have seen it said that email addresses in display names in From: headers are a sign of spaminess. Thus, in the above I suggest that old_From should just be the display name part of the original From: or be munged in some way (replace '@' with ' at ' or ?) so it doesn't look like an email address.
I'm unclear how adding a Reply-To:, depending on users' clients is going to affect mailing list traffic. For an alternative, when the usual action is to reply to mailing list, it seems to me to be smoother to just mangle the From: so as to make it like so:
John Doe <john.doe@example.org.REMOVE.THE.TRAILING.PARTS>
Although the addition of capitalized anti-spam diversions in the domain part of email addresses is still en vogue in several mailing list, this method never gained traction. Why?
Ale
On 10/16/2017 01:17 AM, Alessandro Vesely wrote:
I'm unclear how adding a Reply-To:, depending on users' clients is going to affect mailing list traffic.
Had you read the whole thread from the beginning, you would know that Lindsay is not talking about mailing list mail here.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mon, 2017-10-16 at 09:31 -0700, Mark Sapiro wrote:
On 10/16/2017 01:17 AM, Alessandro Vesely wrote:
I'm unclear how adding a Reply-To:, depending on users' clients is going to affect mailing list traffic.
Had you read the whole thread from the beginning, you would know that Lindsay is not talking about mailing list mail here.
Alessandro, I'm stealing code ;)
DMARC is a hassle for those of us who operate legitimate email redirection services, e.g. for customers with custom domain names who get at least some of their email someplace such as Gmail. My MTA (Courier) allows the introduction of an arbitrary program into the mail pipe, which in this case is something I'm putting together to do a header munge a la Mailman. Mark has been most helpful and the piece is almost done.
Python just WORKS :)
--
Lindsay Haisley | "The first casualty when
FMP Computer Services | war comes is truth."
512-259-1190 |
http://www.fmp.com | -- Hiram W Johnson
On Mon, 2017-10-16 at 09:31 -0700, Mark Sapiro wrote:
On 10/16/2017 01:17 AM, Alessandro Vesely wrote:
I'm unclear how adding a Reply-To:, depending on users' clients is going to affect mailing list traffic.
Had you read the whole thread from the beginning, you would know that Lindsay is not talking about mailing list mail here.
I've ported much of the MM 2 DMARC mitigation code to an email processor for the Courier MTA. The principles I've used could probably be applied to many MTAs. If anyone is interested, I'll be happy to post my code (dmarc_shield.py) to this list as an attachment, and if anyone has the time and inclination to critique it, or point out potential bugs or problems I'd be happy to receive them.
Many thanks to Mark for taking the time to go over the salient points in the Mailman code in email.
--
Lindsay Haisley | "The first casualty when
FMP Computer Services | war comes is truth."
512-259-1190 |
http://www.fmp.com | -- Hiram W Johnson
Alessandro Vesely writes:
I'm unclear how adding a Reply-To:, depending on users' clients is going to affect mailing list traffic. For an alternative, when the usual action is to reply to mailing list, it seems to me to be smoother to just mangle the From: so as to make it like so:
John Doe <john.doe@example.org.REMOVE.THE.TRAILING.PARTS>
Although the addition of capitalized anti-spam diversions in the domain part of email addresses is still en vogue in several mailing list, this method never gained traction. Why?
I would guess the main reasons are that it's ugly and makes the convenient automatic field-filling features of MUAs inaccurate, especially those MUAs that don't actually present the address (which are very common these days).
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
Sent from my iPhone
On Oct 16, 2017, at 10:39 PM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
I would guess the main reasons are that it's ugly and makes the convenient automatic field-filling features of MUAs inaccurate, especially those MUAs that don't actually present the address (which are very common these days).
What I did, at Mark’s suggestion, Was to convert the “@“ symbol in the original From to the word “at“ and add the whole she-bang to the new From address comment.
I’m not sure about the usefulness of tacking the old From address onto an existing Reply–To address. Yes, it preserves choice, but it’s a choice that few, if any of FMP’s customers/users would really understand or make use of. Given the simplistic design of most modern MUAs, and the general lack of knowledge about the inner workings of email clients (and the possibilities therein), most folks, including me, just hit the reply button and take it from there.
Sent from my iPhone
I gather you have completed your program already, but I had this in the works and it might be useful for people doing similar things.
Lindsay Haisley writes:
Is there any reason to pull in a more recent MM 2 and use the DMARC detection code therein? Speed is important here since this is simply a turnaround on a single email, not dependent on any list variables. I'm reluctant to burden every redirection turnaround with an HTTP look-up.
You may want to consider improving performance by caching DNS results by domain. This should be reasonable space as long as you do this check after spam elimination. If expiries are fixed you will need to have a reasonably short expiry on negative (p=none) results (fails nasty -- bounces from receivers), but could have a pretty long one on positive results. You could also get the actual TTL out of the DNS reply for more accurate expiry.
Of course the effectiveness of caching depends heavily on the actual pattern of mail received at the domains in question.
You don't need to burden each redirection with an HTTP lookup. There's only one publicsuffix list, which you can download occasionally. I would guess once a day would be more than enough and no burden at all; you could even do it asynchronously in a cron job.
if from_domain publishes bad DMARC: if Reply-To does not exist: copy From header to Reply-To Replace From with "On behalf of old_From" <postmaster@fmp.com> Feed headers and body to Courier's sendmail clone
My take on it is that this should work OK.
I think you should consider adding an else to the inner if:
else:
append From to Reply-To
as Mailman does. Otherwise the user has to copy/paste the address if they really want to reply to the author rather than the Reply-To for some reason, and it may not be present at all if you follow Mark's advice to not copy it to the display name in From.
I'm with Mark on everything else. I don't think the probability you need to deal with organizational domains is that high, but the costs are potentially high (collateral damage = disabled or unsubscribed users).
On Oct 16, 2017, at 10:38 PM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
I gather you have completed your program already, but I had this in the works and it might be useful for people doing similar things.
Good points, Stephen. Thanks. Yes, I’ve got the basic code well honed to cooperate with Courier but have close to a week of hard work ahead of me on totally unrelated tasks, so I’m putting this project on the shelf for the moment. I’ll put proper caching on the short to-do list.
This puts me in mind of the old adage, the only way to get a program finished is to kill the programmer :)
On Tue, 2017-10-17 at 12:38 +0900, Stephen J. Turnbull wrote:
I gather you have completed your program already, but I had this in the works and it might be useful for people doing similar things.
If anyone is interested in my DMARC mitigation code for Courier-MTA it's online at <http://linode.fmp.com/contrib/dmarc_shield.py>. There are lots of usage notes and such in the code comments at the beginning of the program.
Stephen, as a first step to speeding this piece up, I've made the Organizational Domain database lookup local to the machine hosting the MTA so it's very fast and doesn't load the publicsuffix.org web server.
--
Lindsay Haisley | "The first casualty when
FMP Computer Services | war comes is truth."
512-259-1190 |
http://www.fmp.com | -- Hiram W Johnson
participants (5)
-
Alessandro Vesely
-
Lindsay Haisley
-
Lindsay Haisley (linode)
-
Mark Sapiro
-
Stephen J. Turnbull