Question about content filtering in message body
![](https://secure.gravatar.com/avatar/da8c133a1d0aac504b0c9cb93d7e5c26.jpg?s=120&d=mm&r=g)
Hi,
Newbie, so please bear with me.
I'm looking for a way to content filter the text of the message body (i.e. blacklist bad words, etc) to a list. I've looked at Content Filtering, but that appears to be associated with message type, not message text? And Topics filters appears to just be focused on the header and subject information? Can someone point me in the right direction? Thanks!
![](https://secure.gravatar.com/avatar/b273ab068bc220d17a3e4c710c401c4b.jpg?s=120&d=mm&r=g)
On 5/18/2023 8:44 AM, Scott High wrote:
I'm looking for a way to content filter the text of the message body (i.e. blacklist bad words, etc) to a list. I've looked at Content Filtering, but that appears to be associated with message type, not message text? And Topics filters appears to just be focused on the header and subject information?
Sounds like you want the equivalent of a spam filter, and that sort of filtering should be applied before the email even gets to mailman.
If you're using *nix, clam-av should do the job and isn't difficult to set up.
z!
![](https://secure.gravatar.com/avatar/0fbcef57d028af495d8c9a5992405f78.jpg?s=120&d=mm&r=g)
On Thu, May 18, 2023 at 7:18 PM Carl Zwanzig <cpz@tuunq.com> wrote:
rspamd or spamassassin, not clamav!
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html]
![](https://secure.gravatar.com/avatar/56f108518d7ee2544412cc80978e3182.jpg?s=120&d=mm&r=g)
On 5/18/23 08:44, Scott High wrote:
I'm looking for a way to content filter the text of the message body (i.e. blacklist bad words, etc) to a list. I've looked at Content Filtering, but that appears to be associated with message type, not message text? And Topics filters appears to just be focused on the header and subject information? Can someone point me in the right direction? Thanks!
If this is Mailman 2.1.x, see https://wiki.list.org/x/4030615
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
![](https://secure.gravatar.com/avatar/d7a733932b2860b31e857a9186b045d3.jpg?s=120&d=mm&r=g)
One popular approach for text filtering is to create a list of "bad words" or inappropriate terms and check if any of those words appear in the message body. Here's a high-level overview of how you can implement this:
Define a list of inappropriate words or phrases that you want to filter. This list can be as extensive as you need it to be, and you can include words related to profanity, offensive language, or any other content you wish to block.
Tokenize the message body: Split the text into individual words or tokens. The exact method for tokenization depends on the programming language or library you are using. Common options include using regular expressions or built-in string manipulation functions.
Compare the tokens with the list of inappropriate words: Iterate over the tokens in the message body and check if each token appears in your list of inappropriate words. You may need to consider case sensitivity and handle variations of words (e.g., plural forms, alternative spellings) depending on your requirements.
Take action based on the results: If you find any matches between the tokens and the inappropriate word list, you can decide how to handle those messages. Options include flagging the message for manual review, replacing inappropriate words with asterisks or other symbols, or rejecting the message altogether.
Please note that implementing a content filtering system involves some level of customization based on your specific needs and the programming language or platform you are working with. There are also third-party services and libraries available that provide more advanced text filtering capabilities, such as natural language processing (NLP) algorithms or machine learning models, which can help improve the accuracy of the filtering process.
![](https://secure.gravatar.com/avatar/da8c133a1d0aac504b0c9cb93d7e5c26.jpg?s=120&d=mm&r=g)
Many thanks to all who replied on this! Once y'all pointed me in the right direction, I was able to go back up the chain a bit and find what I needed in SpamAssassin to block the particular spammer in question based on his Bitcoin. Thanks again!
From: Ethan Lewis <iethanlewis7@gmail.com> Sent: Thursday, May 25, 2023 5:00:40 AM To: mailman-users@python.org Subject: [Mailman-Users] Re: Question about content filtering in message body
One popular approach for text filtering is to create a list of "bad words" or inappropriate terms and check if any of those words appear in the message body. Here's a high-level overview of how you can implement this:
Define a list of inappropriate words or phrases that you want to filter. This list can be as extensive as you need it to be, and you can include words related to profanity, offensive language, or any other content you wish to block.
Tokenize the message body: Split the text into individual words or tokens. The exact method for tokenization depends on the programming language or library you are using. Common options include using regular expressions or built-in string manipulation functions.
Compare the tokens with the list of inappropriate words: Iterate over the tokens in the message body and check if each token appears in your list of inappropriate words. You may need to consider case sensitivity and handle variations of words (e.g., plural forms, alternative spellings) depending on your requirements.
Take action based on the results: If you find any matches between the tokens and the inappropriate word list, you can decide how to handle those messages. Options include flagging the message for manual review, replacing inappropriate words with asterisks or other symbols, or rejecting the message altogether.
Please note that implementing a content filtering system involves some level of customization based on your specific needs and the programming language or platform you are working with. There are also third-party services and libraries available that provide more advanced text filtering capabilities, such as natural language processing (NLP) algorithms or machine learning models, which can help improve the accuracy of the filtering process.
Mailman-Users mailing list -- mailman-users@python.org To unsubscribe send an email to mailman-users-leave@python.org https://mail.python.org/mailman3/lists/mailman-users.python.org/ Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/ https://mail.python.org/archives/list/mailman-users@python.org/ Member address: shigh@msrealtors.org
![](https://secure.gravatar.com/avatar/b273ab068bc220d17a3e4c710c401c4b.jpg?s=120&d=mm&r=g)
On 5/18/2023 8:44 AM, Scott High wrote:
I'm looking for a way to content filter the text of the message body (i.e. blacklist bad words, etc) to a list. I've looked at Content Filtering, but that appears to be associated with message type, not message text? And Topics filters appears to just be focused on the header and subject information?
Sounds like you want the equivalent of a spam filter, and that sort of filtering should be applied before the email even gets to mailman.
If you're using *nix, clam-av should do the job and isn't difficult to set up.
z!
![](https://secure.gravatar.com/avatar/0fbcef57d028af495d8c9a5992405f78.jpg?s=120&d=mm&r=g)
On Thu, May 18, 2023 at 7:18 PM Carl Zwanzig <cpz@tuunq.com> wrote:
rspamd or spamassassin, not clamav!
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html]
![](https://secure.gravatar.com/avatar/56f108518d7ee2544412cc80978e3182.jpg?s=120&d=mm&r=g)
On 5/18/23 08:44, Scott High wrote:
I'm looking for a way to content filter the text of the message body (i.e. blacklist bad words, etc) to a list. I've looked at Content Filtering, but that appears to be associated with message type, not message text? And Topics filters appears to just be focused on the header and subject information? Can someone point me in the right direction? Thanks!
If this is Mailman 2.1.x, see https://wiki.list.org/x/4030615
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
![](https://secure.gravatar.com/avatar/d7a733932b2860b31e857a9186b045d3.jpg?s=120&d=mm&r=g)
One popular approach for text filtering is to create a list of "bad words" or inappropriate terms and check if any of those words appear in the message body. Here's a high-level overview of how you can implement this:
Define a list of inappropriate words or phrases that you want to filter. This list can be as extensive as you need it to be, and you can include words related to profanity, offensive language, or any other content you wish to block.
Tokenize the message body: Split the text into individual words or tokens. The exact method for tokenization depends on the programming language or library you are using. Common options include using regular expressions or built-in string manipulation functions.
Compare the tokens with the list of inappropriate words: Iterate over the tokens in the message body and check if each token appears in your list of inappropriate words. You may need to consider case sensitivity and handle variations of words (e.g., plural forms, alternative spellings) depending on your requirements.
Take action based on the results: If you find any matches between the tokens and the inappropriate word list, you can decide how to handle those messages. Options include flagging the message for manual review, replacing inappropriate words with asterisks or other symbols, or rejecting the message altogether.
Please note that implementing a content filtering system involves some level of customization based on your specific needs and the programming language or platform you are working with. There are also third-party services and libraries available that provide more advanced text filtering capabilities, such as natural language processing (NLP) algorithms or machine learning models, which can help improve the accuracy of the filtering process.
![](https://secure.gravatar.com/avatar/da8c133a1d0aac504b0c9cb93d7e5c26.jpg?s=120&d=mm&r=g)
Many thanks to all who replied on this! Once y'all pointed me in the right direction, I was able to go back up the chain a bit and find what I needed in SpamAssassin to block the particular spammer in question based on his Bitcoin. Thanks again!
From: Ethan Lewis <iethanlewis7@gmail.com> Sent: Thursday, May 25, 2023 5:00:40 AM To: mailman-users@python.org Subject: [Mailman-Users] Re: Question about content filtering in message body
One popular approach for text filtering is to create a list of "bad words" or inappropriate terms and check if any of those words appear in the message body. Here's a high-level overview of how you can implement this:
Define a list of inappropriate words or phrases that you want to filter. This list can be as extensive as you need it to be, and you can include words related to profanity, offensive language, or any other content you wish to block.
Tokenize the message body: Split the text into individual words or tokens. The exact method for tokenization depends on the programming language or library you are using. Common options include using regular expressions or built-in string manipulation functions.
Compare the tokens with the list of inappropriate words: Iterate over the tokens in the message body and check if each token appears in your list of inappropriate words. You may need to consider case sensitivity and handle variations of words (e.g., plural forms, alternative spellings) depending on your requirements.
Take action based on the results: If you find any matches between the tokens and the inappropriate word list, you can decide how to handle those messages. Options include flagging the message for manual review, replacing inappropriate words with asterisks or other symbols, or rejecting the message altogether.
Please note that implementing a content filtering system involves some level of customization based on your specific needs and the programming language or platform you are working with. There are also third-party services and libraries available that provide more advanced text filtering capabilities, such as natural language processing (NLP) algorithms or machine learning models, which can help improve the accuracy of the filtering process.
Mailman-Users mailing list -- mailman-users@python.org To unsubscribe send an email to mailman-users-leave@python.org https://mail.python.org/mailman3/lists/mailman-users.python.org/ Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/ https://mail.python.org/archives/list/mailman-users@python.org/ Member address: shigh@msrealtors.org
participants (5)
-
Carl Zwanzig
-
Ethan Lewis
-
Mark Sapiro
-
Odhiambo Washington
-
Scott High