[spambayes-dev] spammy subject lines

Paul Sorenson sourceforge at metrak.com
Mon Oct 13 05:34:36 EDT 2003


----- Original Message -----
From: "Tim Peters" <tim.one at comcast.net>
To: "Paul Sorenson" <sourceforge at metrak.com>; "spambayes-dev"
<spambayes-dev at python.org>
Sent: Monday, October 13, 2003 3:47 AM
Subject: RE: [spambayes-dev] spammy subject lines


> [Paul Sorenson]
> > I added the following function:
> >     def leaveOnlyLetters(self, s):
> >         # Return s with any characters not in string.letters removed.
> >         import string
> >         return filter(lambda c: c in string.letters, s)
> >
> > And appended
> >             # Add words with non-letters removed.
> >             for w in x.split():
> >                 yield 'subject:' + self.leaveOnlyLetters(w)
> >
> > At the end of the section to handle subject lines.  It seems that a
> > subject like "stati.stics" generates a clue like "subject:.",
>
> Among others, yes.
>
> > presumably via punctuation_run_re.findall(x)
>
> That is the source of "subject:.".
>
> > and my "cleaned up" subject token doesn't appear to get through - Ie I
> > don't think it is working as I expect.
>
> Why do you think that?  We can't see what you did, and you didn't spell
out
> your evidence.

Well I sent myself a couple of emails with subject lines differing only by
punctuation and checked the clues in the web interface.  Words stripped of
punctuation from the subject line didn't seem to appear in the list.

>   The leaveOnlyLetters method seems to work ok.
>
> It should.  Here's a patch for a more-efficient way...

Only by a factor of 20 :-)

cheers




More information about the spambayes-dev mailing list