Aurelien Bompard writes:
I'd like to discuss what happens when an email is sent by both a member and a nonmember in Mailman3. How is that possible? Very easy, here's my use case : I have my own domain, say example.com, and for convenience and portability I choose to use Gmail as a server/storage/interface. My main adress is alice@example.com and I redirect it to alice@gmail.com, while I set a default identity in gmail to alice@example.com which will set the proper From header. However, for spam detection and spoofing reasons, gmail adds a Sender header with alice@gmail.com. My outgoing emails thus have both a From and a Sender header, and in this case email clients only display the
From header (except Outlook, but eh...)
Your outgoing emails also have an envelope sender, which might be different from both of the above.
Mailman now: when I subscribe to a list, I use my regular address, alice@example.com. But the message.senders property will contain both addresses because of the Sender header. The email goes through the MemberModeration rule, which finds my subscribed address and, by default, associates the "defer" action. The email then goes through the NonMemberModeration rule, which finds my Gmail address and sets the action to "hold" (it ignores my main address because it's a member already).
What do you think about all that? Do you agree there's actually an issue there?
Yes.
Any idea how to solve it? For example, make the NonMember rule exit if a member is found amongst the senders (which would simply be equivalent to making it yield to the Member rule). Bad idea?
Offhand I'd say that having both a Member rule and a NonMember rule is a bad idea. There should be one conceptual test: can we identify a member as the originator of this post? Having Member and NonMember rules that can both "succeed" is not coherent.
I think that what should happen here is that the Member rule should try to identify the originator, and the NonMemberModeration (if in effect) should just check for a "member_identified" property. The member_identified property could be a Boolean or actually contain a list of members (list because it's not obvious what to do if each of From, Sender, and envelope sender corresponds to a different member; that would probably be a policy issue).
I don't really see how order dependence can be avoided without violating DRY all over the place.
Alternatively, NonMemberModeration might not be a rule, but rather a chain. Perhaps that's the most elegant solution, as order dependence between chains is necessary.
OK, I've opened a bug on Launchpad to attach my very basic implementation (plus a unit test). It's just 3 lines, it does not implement Stephen's suggestion (which is probably better but involves some refactoring). Here is the ticket: https://bugs.launchpad.net/mailman/+bug/1291452 I've tested it on my setup, it works as expected.
Aurélien
2014-03-12 1:43 GMT-03:00 Stephen J. Turnbull stephen@xemacs.org:
Aurelien Bompard writes:
I'd like to discuss what happens when an email is sent by both a member and a nonmember in Mailman3. How is that possible? Very easy, here's my use case : I have my own domain, say example.com, and for convenience and portability I choose to use Gmail as a server/storage/interface. My main adress is alice@example.com and I redirect it to alice@gmail.com, while I set a default identity in gmail to alice@example.com which will set the proper From header. However, for spam detection and spoofing reasons, gmail adds a Sender header with alice@gmail.com. My outgoing emails thus have both a From and a Sender header, and in this case email clients only display the
From header (except Outlook, but eh...)
Your outgoing emails also have an envelope sender, which might be different from both of the above.
Mailman now: when I subscribe to a list, I use my regular address, alice@example.com. But the message.senders property will contain both addresses because of the Sender header. The email goes through the MemberModeration rule, which finds my subscribed address and, by default, associates the "defer" action. The email then goes through the NonMemberModeration rule, which finds my Gmail address and sets the action to "hold" (it ignores my main address because it's a member already).
What do you think about all that? Do you agree there's actually an issue there?
Yes.
Any idea how to solve it? For example, make the NonMember rule exit if a member is found amongst the senders (which would simply be equivalent to making it yield to the Member rule). Bad idea?
Offhand I'd say that having both a Member rule and a NonMember rule is a bad idea. There should be one conceptual test: can we identify a member as the originator of this post? Having Member and NonMember rules that can both "succeed" is not coherent.
I think that what should happen here is that the Member rule should try to identify the originator, and the NonMemberModeration (if in effect) should just check for a "member_identified" property. The member_identified property could be a Boolean or actually contain a list of members (list because it's not obvious what to do if each of From, Sender, and envelope sender corresponds to a different member; that would probably be a policy issue).
I don't really see how order dependence can be avoided without violating DRY all over the place.
Alternatively, NonMemberModeration might not be a rule, but rather a chain. Perhaps that's the most elegant solution, as order dependence between chains is necessary.
On Mar 12, 2014, at 01:43 PM, Stephen J. Turnbull wrote:
Offhand I'd say that having both a Member rule and a NonMember rule is a bad idea. There should be one conceptual test: can we identify a member as the originator of this post? Having Member and NonMember rules that can both "succeed" is not coherent.
I agree, in theory, but there are two reasons I split these. At first there was only a membership rule, and the non-member rule was added later. But more importantly, there *is* an implicit ordering, not through the rules (which there can't be), but through the chain in which these rules are links, specifically the default-posting-chain.
Rules either hit or miss, they are pure binary tests. By side-effect, both the member rule and non-member rule set two message metadata keys, 'moderation_action' and 'moderation_sender', and of course the two rules set them differently depending on the defaults and such.
It's the moderation chain that looks at these metadata keys and actually performs the moderation action *if* the rule hits. The moderation chain can be jumped to if the member moderation rule hits, which happens early in the chain, or the non-member rule hits, which happens later in the chain. Think of it like this: if the membership rule hits (i.e. the sender is a member), then we can (but don't necessarily have to) bypass checks like administrivia or implicit destination. This might be the case if I were to configure my list to say "All messages sent by Steve should be accepted without question".
By default, we'll probably defer if the membership test hits, so that we normally do perform the other rule checks.
Toward the end of the chain, we do the non-member check. Let say the message has passed all other rules and it would normally be okay to post, but it comes from a non-member. If the rule hits, we'll again jump to the moderation chain, taking whatever action is appropriate. Usually this will be the default non-member action, but we could potentially do something else, like say "Even though Alice isn't a member of this list, we've seen her posts before and they are on-topic, so let's accept them", or "Bob is a spammer, always discard his messages".
The default posting chain has one final link, which has the 'truth' rule (i.e. it always matches). Thus let's say that the poster is a member with a deferred posting action, and none of the other rules hit, so his message gets sent to the 'accept' chain where it is accepted for posting and gets further processed. The same ultimate result could happen if the poster is a non-member, but the default non-member rule is to accept posts from anybody.
I'm having a hard time right now seeing how we could continue to support these types of operations with a combined member and non-member rule.
I *think* the right solution may be to continue to keep the rules separate, but add an extra check to the nonmember-moderation rule, such that if any of the senders are members, then the rule cannot hit, i.e. the sender is definitely not a non-member. A quick look at Aurélien's patch seems about the right way to do it. I'll hold off on reviewing and merging it though, to get any additional feedback you might have.
Cheers, -Barry
Barry Warsaw writes:
I'm having a hard time right now seeing how we could continue to support these types of operations with a combined member and non-member rule.
I expressed myself poorly. The parameters of the decision logic given the list of senders are different for the two rules so both rules are needed. But I really think that determining the sender should be done in one place by one set of principles, separated from the "to post or to moderate" logic. Maybe we could use all_senders, member_senders, and apparent_sender properties (where the last is Mailman's best guess at who's reponsible for sending the mail for the purpose of moderation), or perhaps just apparent_sender and sender_is_member properties.
I *think* the right solution may be to continue to keep the rules separate, but add an extra check to the nonmember-moderation rule, such that if any of the senders are members, then the rule cannot hit.
This is horrible. You've already done that check in the Member rule.
It's OK as a stop-gap, I don't really object applying to Aurelian's patch in principle, because I think it should be fixed now. The meta-rules about how to compose rulesets need discussion before doing anything more invasive.
On Mar 13, 2014, at 05:06 PM, Stephen J. Turnbull wrote:
I expressed myself poorly. The parameters of the decision logic given the list of senders are different for the two rules so both rules are needed. But I really think that determining the sender should be done in one place by one set of principles, separated from the "to post or to moderate" logic. Maybe we could use all_senders, member_senders, and apparent_sender properties (where the last is Mailman's best guess at who's reponsible for sending the mail for the purpose of moderation), or perhaps just apparent_sender and sender_is_member properties.
This logic requires both the email message and the mailing list object.
Mailman's Message class has both a sender
and senders
property, but the
latter is the interesting one for this topic:
The list will contain email addresses in the order determined by the
configuration variable `sender_headers` in the `[mailman]` section.
By default it uses this list of headers in order:
1. From:
2. envelope sender (i.e. From_, unixfrom, or RFC 2821 MAIL FROM)
3. Reply-To:
4. Sender:
The problem is the destination mailing list. I don't think we can leave this to a rule because rules are by definition unordered and self-contained, so there's no guarantee -- and we shouldn't impose one -- that such a rule will get executed early enough.
We *could* do it in the LMTP runner, but right now, that doesn't query for the mailing list object, since it's not needed in order to dump the message into the appropriate queue. I don't think we can do it in the queue runners, and the message could end up in different queues, so the LMTP runner would be the only common preprocessing possibility (currently, at least).
I *think* the right solution may be to continue to keep the rules separate, but add an extra check to the nonmember-moderation rule, such that if any of the senders are members, then the rule cannot hit.
This is horrible. You've already done that check in the Member rule.
True, at least for the default processing chain. But these chains are configurable and extensible, so again, there's no guarantee that the member rule will even be run.
It's OK as a stop-gap, I don't really object applying to Aurelian's patch in principle, because I think it should be fixed now. The meta-rules about how to compose rulesets need discussion before doing anything more invasive.
We can definitely have that discussion, but I feel quite strongly that rules should be self-contained and unordered, with ordering imposed by the chain of links that rules are associated with.
I agree though that for now Aurelian's patch is what we should do for now.
-Barry
Barry Warsaw writes:
I feel quite strongly that rules should be self-contained and unordered, with ordering imposed by the chain of links that rules are associated with.
I don't understand what you're trying to say here. Are you saying that rules should not have a "rules_to_run_before_this_rule" field, but it's OK if a chain "rule_B, rule_A" is buggy because rule_A should be run before rule_B? Of course we should then fix the bug -- the point is that it is currently very easy for such bugs to occur, because rules may depend on metadata, and set/change metadata.
Perhaps rules should be allowed to add new metadata, but not change existing metadata?
I think I already mentioned that I have a difficulty with the concept of "*pure* Boolean with side effects", too. :-)
N.B. As far as *I* am concerned, you can take your time about responding to this. I need to go review the docs and code on all this anyway. But I suspect that if *I'm* confused about these concepts, *others* may be too, and they're pretty fundamental to customizing and extending Mailman 3.
On Mar 17, 2014, at 02:34 PM, Stephen J. Turnbull wrote:
I don't understand what you're trying to say here. Are you saying that rules should not have a "rules_to_run_before_this_rule" field, but it's OK if a chain "rule_B, rule_A" is buggy because rule_A should be run before rule_B? Of course we should then fix the bug -- the point is that it is currently very easy for such bugs to occur, because rules may depend on metadata, and set/change metadata.
I know you remember these diagrams: :)
http://pythonhosted.org//mailman/src/mailman/docs/8-miles-high.html#basic-me...
Perhaps rules should be allowed to add new metadata, but not change existing metadata?
I think I already mentioned that I have a difficulty with the concept of "*pure* Boolean with side effects", too. :-)
I was being sloppy in my previous response.
Rules either hit or miss. Their check() method is called and it must return True or False. They cannot depend on any other rule having been run before or after them, i.e. they are self-contained. The check() method gets a mailing list object, the message, and a metadata dictionary.
Rules must not have a side-effect on either the mailing list or the message. They may have a side-effect on the metadata dictionary, but as you say above, they may only add new metadata, not delete or change existing metadata. They also can't refer to metadata from other rules, because you don't know what order rules are run in.
After a message is accepted (and processed) by the LMTP server, it is dropped
into the incoming
queue. The incoming queue runner picks the message up and
processes messages through a chain. By default this is the
default-posting-chain
. This chain is composed of list of rules and actions
(parlance, a 'chain of links'). Each link in the chain contains a rule, an
action, and a target. If the rule hits, the action is taken. If the rule
misses, the next link in the chain is processed. Actions can be all kinds of
things (see the interface for details), but it is often 'defer', which more or
less acts like a rule miss in that the next link in the chain is processed.
(The difference is that the chain processing infrastructure records which
rules hit and which rules miss. In the default processing chain, there is an
any
rule which looks at the metadata recorded by the processing
infrastructure, and *it* takes a jump
action if any of the deferred rules
have hit.)
(Please note that this logic is embodied in the built-in chain, which by default is the chain that the incoming queue runner processes. Other chains do not usually have links of rules since their functionality is much simpler. E.g. the discard chain just writes a log message, fires an event, and exits, thus throwing away the message.)
N.B. As far as *I* am concerned, you can take your time about responding to this. I need to go review the docs and code on all this anyway. But I suspect that if *I'm* confused about these concepts, *others* may be too, and they're pretty fundamental to customizing and extending Mailman 3.
Pictures help a lot here! :)
If you want to see where the metadata added by the member and nonmember moderation rules gets used, look at the moderation chain. If either of those two rules hit, then processing jumps to the moderation chain.
All the moderation chain does though is dispatch to some other chain (in
parlance, it jumps to some other chain). Which chain it jumps to depends on
the metadata written by one of those two rules, specifically the
'moderation_action'. This piece of metadata corresponds directly to the
action that should be taken when a particular person posts to the mailing
list. So if stephen@ posts and is a member and has an action of discard
,
then the moderation chain jumps to the discard chain described above.
Let's make sure we have a whiteboard at Pycon. :)
Cheers, -Barry
participants (3)
-
Aurelien Bompard
-
Barry Warsaw
-
Stephen J. Turnbull