[spambayes-dev] Re: [Spambayes] how spambayes handles
image-onlyspams
Meyer, Tony
T.A.Meyer at massey.ac.nz
Tue Sep 9 13:17:41 EDT 2003
> > Why do you need a customized parser? You'd probably reach your end
> > goal faster by reading and modifying tokenizer.py.
>
> Okay, I'm really green at this, although I occasionally am
> able to make some tiny changes to Perl scripts if I'm
> careful. I was thinking that the To: address is probably a
> really good clue to work with, so I'd like a couple of hints
> as to where in tokenizer.py I should be looking.
If you want to add tokens based on the headers of the message, add
something to tokenize_headers() in tokenizer.py. Tokens based on the
body, add to tokenize_body(). HTML (etc) stuff, look at the various
Stripper() classes. For To: addresses, look at the stuff regarding the
"tokenizer":"address_headers" option - line 1151.
=Tony Meyer
More information about the spambayes-dev
mailing list