Re: [Mailman-Users] Regexp for blocking addresses
On 9/25/15 7:57 AM, Matthew Saltzman wrote:
That's still much more aggressive than what I was trying to say. I actually want to ban precisely all variants of the one address
joeblow@gmail.com
(and about a dozen other addresses) that include embedded periods anywhere and the suffix, but not other gmail addresses. grep finds them with
\.\?j\.\?o\.\?e\.\?b\.\?l\.\?o\.\?w\.\ ?+.*@gmail \.com
(I might want \.* instead of \.\?) but adding
^\.\?j\.\?o\.\?e\.\?b\.\?l\.\?o\.\?w\.\ ?+.*@gmail\.com
to the ban list doesn't seem to block them.
Because in a python RE, \? is a literal '?', not a '0 or 1 of repeat'. For grep you need the \? to give ? its special meaning. a regexp for egrep or grep -E will be closer to what you want for python. You want
^\.?j\.?o\.?e\.?b\.?l\.?o\.?w xxx \+.*@gmail\.com
where I'm unsure about the ' xxx ' part because I don't understand what '\ ?' is supposed to do?
Or instead of the above and to account for multiple '.' maybe
^\.*j\.*o\.*e\.*b\.*l\.*o\.*w\.*\+.*@gmail\.com
which says zero or more dots followed by j followed by zero or more dots followed by o, etc., followed by w followed by zero or more dots followed by + followed by anything followed by @gmail.com.
See <https://docs.python.org/2/library/re.html#regular-expression-syntax>.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Fri, 2015-09-25 at 08:23 -0700, Mark Sapiro wrote:
On 9/25/15 7:57 AM, Matthew Saltzman wrote:
That's still much more aggressive than what I was trying to say. I actually want to ban precisely all variants of the one address
joeblow@gmail.com
(and about a dozen other addresses) that include embedded periods anywhere and the suffix, but not other gmail addresses. grep finds them with
\.\?j\.\?o\.\?e\.\?b\.\?l\.\?o\.\?w\.\ ?+.*@gmail \.com
(I might want \.* instead of \.\?) but adding
^\.\?j\.\?o\.\?e\.\?b\.\?l\.\?o\.\?w\.\ ?+.*@gmail\.com
to the ban list doesn't seem to block them.
Because in a python RE, \? is a literal '?', not a '0 or 1 of repeat'. For grep you need the \? to give ? its special meaning. a regexp for egrep or grep -E will be closer to what you want for python. You want
^\.?j\.?o\.?e\.?b\.?l\.?o\.?w xxx \+.*@gmail\.com
where I'm unsure about the ' xxx ' part because I don't understand what '\ ?' is supposed to do?
Just a formatting mistake.
Or instead of the above and to account for multiple '.' maybe
^\.*j\.*o\.*e\.*b\.*l\.*o\.*w\.*\+.*@gmail\.com
which says zero or more dots followed by j followed by zero or more dots followed by o, etc., followed by w followed by zero or more dots followed by + followed by anything followed by @gmail.com.
See < https://docs.python.org/2/library/re.html#regular-expression-syntax>.
I think that's what my problem is. Will give it a try.
Thanks very much.
-- Matthew Saltzman Clemson University Math Sciences mjs AT clemson DOT edu
On 9/25/15 7:57 AM, Matthew Saltzman wrote:
That's still much more aggressive than what I was trying to say. I actually want to ban precisely all variants of the one address
joeblow at gmail.com
(and about a dozen other addresses) that include embedded periods anywhere and the suffix, but not other gmail addresses.
Here's another idea.
Find the following in /path/to/mailman/Mailman/MailList.py
def GetBannedPattern(self, email):
"""Returns matched entry in ban_list if email matches.
Otherwise returns None.
"""
return self.GetPattern(email, self.ban_list)
and change it to
bad_users = ['joeblow@gmailcom', 'johndoe@gmailcom', 'jackblack@gmailcom', ... (the rest of the addrs to ban) ] def GetBannedPattern(self, email): """Returns matched entry in ban_list if email matches. Otherwise returns None. """ if re.sub('\.', '' re.sub('\+.*@', '@', email.lower())) in bad_users: return 'bad_user' return self.GetPattern(email, self.ban_list)
Note that the line
if re.sub('\.', '' re.sub('\+.*@', '@', email.lower())) in bad_users:
should be indented 8 spaces as shown above, but still all one line. What this does is lower-case the email address, then replace a '+' if any and all that follows up to an '@' with just the '@' and finally removes all the '.' and if the result is in the bad_users list, the address will be banned for matching 'bad_user'. Note too that there is no '.' in the gmailcom part of the bad addresses as we will have removed that before the test.
If you do this, you have to restart Mailman after modifying MailList.py.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 09/25/2015 09:01 PM, Mark Sapiro wrote:
Here's another idea.
Find the following in /path/to/mailman/Mailman/MailList.py
def GetBannedPattern(self, email): """Returns matched entry in ban_list if email matches. Otherwise returns None. """ return self.GetPattern(email, self.ban_list)
and change it to
bad_users = ['joeblow@gmailcom', 'johndoe@gmailcom', 'jackblack@gmailcom', ... (the rest of the addrs to ban) ]
...
Ooops. In the above addition, the bad_users line needs to be indented 4 spaces to line up with the following ' def ...'.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Fri, 2015-09-25 at 17:03 +0000, Matthew Saltzman wrote:
On Fri, 2015-09-25 at 08:23 -0700, Mark Sapiro wrote:
On 9/25/15 7:57 AM, Matthew Saltzman wrote:
That's still much more aggressive than what I was trying to say. I actually want to ban precisely all variants of the one address
joeblow@gmail.com
(and about a dozen other addresses) that include embedded periods anywhere and the suffix, but not other gmail addresses. grep finds them with
\.\?j\.\?o\.\?e\.\?b\.\?l\.\?o\.\?w\.\ ?+.*@gmail \.com
(I might want \.* instead of \.\?) but adding
^\.\?j\.\?o\.\?e\.\?b\.\?l\.\?o\.\?w\.\ ?+.*@gmail\.com
to the ban list doesn't seem to block them.
Because in a python RE, \? is a literal '?', not a '0 or 1 of repeat'. For grep you need the \? to give ? its special meaning. a regexp for egrep or grep -E will be closer to what you want for python. You want
^\.?j\.?o\.?e\.?b\.?l\.?o\.?w xxx \+.*@gmail\.com
where I'm unsure about the ' xxx ' part because I don't understand what '\ ?' is supposed to do?
Just a formatting mistake.
Or instead of the above and to account for multiple '.' maybe
^\.*j\.*o\.*e\.*b\.*l\.*o\.*w\.*\+.*@gmail\.com
which says zero or more dots followed by j followed by zero or more dots followed by o, etc., followed by w followed by zero or more dots followed by + followed by anything followed by @gmail.com.
See < https://docs.python.org/2/library/re.html#regular-expression-syntax
;.
I think that's what my problem is. Will give it a try.
Thanks very much.
So I used the ban script from http://nigelb.me/2015-08-26-mailman-attacks.html to add regexps of the form
"^\.*j\.*o\.*e\.*b\.*l\.*o\.*w \ .*+.*@gmail \.com"
to the ban list. They show up in the ban_list window looking correct (without the quotes) but don't seem to be blocking the intended addresses. The same pattern without the \.*'s does block the addresses not containing embedded periods.
Not quite sure what I'm missing or where to go here. I'd prefer not to actually hack the code in MailList.py if I can avoid it.
-- Matthew Saltzman Clemson University Math Sciences mjs AT clemson DOT edu
On 9/25/2015 9:16 PM, Mark Sapiro wrote:
On 09/25/2015 09:01 PM, Mark Sapiro wrote:
Here's another idea.
Find the following in /path/to/mailman/Mailman/MailList.py
def GetBannedPattern(self, email): """Returns matched entry in ban_list if email matches. Otherwise returns None. """ return self.GetPattern(email, self.ban_list)
and change it to
bad_users = ['joeblow@gmailcom', 'johndoe@gmailcom', 'jackblack@gmailcom', ... (the rest of the addrs to ban) ] ...
Ooops. In the above addition, the bad_users line needs to be indented 4 spaces to line up with the following ' def ...'.
Would it make sense to define the "bad users" in mm_cfg.py? That way it would be a bit easier to add/remove/change addresses when needed. Also, wouldn't updating Mailman replace the modified /path/to/mailman/Mailman/MailList.py?
Just a thought, Chris
On 09/28/2015 07:46 AM, Matthew Saltzman wrote:
So I used the ban script from http://nigelb.me/2015-08-26-mailman-attacks.html to add regexps of the form
"^\.*j\.*o\.*e\.*b\.*l\.*o\.*w \ .*+.*@gmail \.com"
to the ban list. They show up in the ban_list window looking correct (without the quotes) but don't seem to be blocking the intended addresses. The same pattern without the \.*'s does block the addresses not containing embedded periods.
Not quite sure what I'm missing or where to go here. I'd prefer not to actually hack the code in MailList.py if I can avoid it.
I'm not sure either because I don't know if any of that white space is actually there or not.
However, this part '.*+' is absolutely wrong. The + needs a preceding
or the entire RE is invalid.
And you don't want to hack the code either because trying to keep a list of actual addresses is futile because there keep being more, the permutations with '.' notwithstanding. You want either the regexp I'm using or one of the 'safer' ones in my reply at <https://mail.python.org/pipermail/mailman-users/2015-September/079874.html>.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 09/28/2015 10:22 AM, Chris Nulk wrote:
Would it make sense to define the "bad users" in mm_cfg.py? That way it would be a bit easier to add/remove/change addresses when needed. Also, wouldn't updating Mailman replace the modified /path/to/mailman/Mailman/MailList.py?
Yes, making it an mm_cfg setting might be better and would certainly be done if I were making this a feature, but I'm not, so modifying MailList.py would be required in any case, and yes, these mods would be reversed by an update. That's why if you make local mods to the code, you MUST keep a local patch file or other record so you can figure out after an upgrade which of your local mods are still required and how to apply them.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mon, 2015-09-28 at 19:41 -0700, Mark Sapiro wrote:
On 09/28/2015 07:46 AM, Matthew Saltzman wrote:
So I used the ban script from http://nigelb.me/2015-08-26-mailman-attacks.html to add regexps of the form
"^\.*j\.*o\.*e\.*b\.*l\.*o\.*w \ .*+.*@gmail \.com"
to the ban list. They show up in the ban_list window looking correct (without the quotes) but don't seem to be blocking the intended addresses. The same pattern without the \.*'s does block the addresses not containing embedded periods.
Not quite sure what I'm missing or where to go here. I'd prefer not to actually hack the code in MailList.py if I can avoid it.
I'm not sure either because I don't know if any of that white space is actually there or not.
Not. I don't know where it's coming from, as it doesn't look like that when I compose.
However, this part '.*+' is absolutely wrong. The + needs a preceding
or the entire RE is invalid.
Ah, that's probably it.
And you don't want to hack the code either because trying to keep a list of actual addresses is futile because there keep being more, the permutations with '.' notwithstanding. You want either the regexp I'm using or one of the 'safer' ones in my reply at <https://mail.python.org/pipermail/mailman-users/2015-September/07987 4.html>.
Got it, thanks. I will switch to that one. Now would like to figure out how to delete all my futile attempts from the ban list. Tried modifying the ban script to remove instead of append, but I'm still doing something wrong.
Anyway, thanks for your help.
-- Matthew Saltzman Clemson University Math Sciences mjs AT clemson DOT edu
On 09/29/2015 08:24 AM, Matthew Saltzman wrote:
Got it, thanks. I will switch to that one. Now would like to figure out how to delete all my futile attempts from the ban list. Tried modifying the ban script to remove instead of append, but I'm still doing something wrong.
If the various list's ban_list attributes were empty to start with, use the following withlist script based on <https://www.msapiro.net/scripts/add_banned.py>
----------------------Cut----------------------------- """Set the ban_list for all lists to a single address or regexp.
Save as bin/set_banned.py
Run via
bin/withlist -a -r set_banned -- <address_to_ban>
where <address_to_ban> is the actual email address or regexp to be set as ban_list for all lists. """
def set_banned(mlist, address): if not mlist.Locked(): mlist.Lock() mlist.ban_list = [address] mlist.Save() mlist.Unlock() ----------------------Cut-----------------------------
If you want to remove individual regexps/addresses from all list's ban_list, changing 'ban_list.append(address)' to 'ban_list.remove(address)' in whatever script you were using should work, but the value of 'address' has to be exactly what is in ban_list that you want to remove.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Tue, 2015-09-29 at 09:55 -0700, Mark Sapiro wrote:
On 09/29/2015 08:24 AM, Matthew Saltzman wrote:
Got it, thanks. I will switch to that one. Now would like to figure out how to delete all my futile attempts from the ban list. Tried modifying the ban script to remove instead of append, but I'm still doing something wrong.
If the various list's ban_list attributes were empty to start with, use the following withlist script based on <https://www.msapiro.net/scripts/add_banned.py>
----------------------Cut----------------------------- """Set the ban_list for all lists to a single address or regexp.
Save as bin/set_banned.py
Run via
bin/withlist -a -r set_banned -- <address_to_ban>
where <address_to_ban> is the actual email address or regexp to be set as ban_list for all lists. """
def set_banned(mlist, address): if not mlist.Locked(): mlist.Lock() mlist.ban_list = [address] mlist.Save() mlist.Unlock() ----------------------Cut-----------------------------
If you want to remove individual regexps/addresses from all list's ban_list, changing 'ban_list.append(address)' to 'ban_list.remove(address)' in whatever script you were using should work, but the value of 'address' has to be exactly what is in ban_list that you want to remove.
Got it. Thanks for all your help.
-- Matthew Saltzman Clemson University Math Sciences mjs AT clemson DOT edu
On Mon, 2015-09-28 at 19:41 -0700, Mark Sapiro wrote:
On 09/28/2015 07:46 AM, Matthew Saltzman wrote:
So I used the ban script from http://nigelb.me/2015-08-26-mailman-attacks.html to add regexps of the form
"^\.*j\.*o\.*e\.*b\.*l\.*o\.*w \ .*+.*@gmail \.com"
to the ban list. They show up in the ban_list window looking correct (without the quotes) but don't seem to be blocking the intended addresses. The same pattern without the \.*'s does block the addresses not containing embedded periods.
Not quite sure what I'm missing or where to go here. I'd prefer not to actually hack the code in MailList.py if I can avoid it.
I'm not sure either because I don't know if any of that white space is actually there or not.
However, this part '.*+' is absolutely wrong. The + needs a preceding
or the entire RE is invalid.And you don't want to hack the code either because trying to keep a list of actual addresses is futile because there keep being more, the permutations with '.' notwithstanding. You want either the regexp I'm using or one of the 'safer' ones in my reply at <https://mail.python.org/pipermail/mailman-users/2015-September/07987 4.html>.
The regexp in the linked message has left parens instead of left braces, but otherwise seems to be working so far. I added it with @gmail.com and @usc.edu to be even safer (as those are the only ones I've seen to this point), but will watch for other domains for a while.
Thanks!
-- Matthew Saltzman Clemson University Math Sciences mjs AT clemson DOT edu
On 09/29/2015 11:09 AM, Matthew Saltzman wrote:
On Mon, 2015-09-28 at 19:41 -0700, Mark Sapiro wrote:
You want either the regexp I'm using or one of the 'safer' ones in my reply at <https://mail.python.org/pipermail/mailman-users/2015-September/07987 4.html>.
The regexp in the linked message has left parens instead of left braces, but otherwise seems to be working so far. I added it with @gmail.com and @usc.edu to be even safer (as those are the only ones I've seen to this point), but will watch for other domains for a while.
My bad. A typo. They should be braces as in
^.*\+\d{3,}@
or
^.*\+\d{5,}@
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
Chris Nulk
-
Mark Sapiro
-
Matthew Saltzman