[spambayes-dev] spammy subject lines

Skip Montanaro skip at pobox.com
Fri Oct 10 22:39:55 EDT 2003


    Paul> Looking at the tokenizer code for subject lines I was wondering if
    Paul> there was value in stripping punctuation then doing the usual word
    Paul> tokenisation.

Interesting idea.  I added that to the NEWTRICKS file.  Another idea is
mapping digits to letters in certain situations ("V1agra" -> "Viagra").

    Paul> I would be happy to have a crack at a patch if this hasn't been
    Paul> tried already, I just wanted to float the idea first given that I
    Paul> am unfamiliar with the existing codebase and unsure whether it
    Paul> might have already been tried.

Give it a go and let us know how it works.

Skip



More information about the spambayes-dev mailing list