[spambayes-dev] trimming email addresses

Skip Montanaro skip at pobox.com
Tue Jul 29 13:04:33 EDT 2003


It occurred to me yesterday that it might be worthwhile trimming email
addresses which contain "+" signs.  Most MTAs understand this notation and
deliver such messages to the email address on the left of the "+".  For
example, Mailman 2.1 uses this.  The email address in the Sender field of my
spambayes mail is

    spambayes-bounces+skip=pobox.com at python.org

A message to that address would go to spambayes-bounces at python.org where the
"skip=python.org" part is extracted and treated as a parameter by the
recipient (usually a program).

It seems to me that the Spambayes tokenizer should only consider the
"spambayes-bounces" part of the address.  I don't think this will improve
the tokenizer in the general case, but it does seem like the correct way to
handle such addresses.

The thing that made me consider this is that I have an auto-responder which
uses the same mechanism to allow people to remove themselves from a
database.  I get a cc of each of those messages.  I'm having a devil of a
time getting those messages to classify as ham.  They are generally empty or
contain at most a short phrase like "please remove me".  Adding "to" to the
address_headers doesn't help, because the left side of each email address is
unique.  If we were trimming parameters from email addresses, the same token
would be generated for each of these.

Any objection to me implementing this and checking it in?

Skip




More information about the spambayes-dev mailing list