[spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams

Meyer, Tony T.A.Meyer at massey.ac.nz
Tue Sep 9 13:17:41 EDT 2003


> > Why do you need a customized parser?  You'd probably reach your end 
> > goal faster by reading and modifying tokenizer.py.
> 
> Okay, I'm really green at this, although I occasionally am 
> able to make some tiny changes to Perl scripts if I'm 
> careful. I was thinking that the To: address is probably a 
> really good clue to work with, so I'd like a couple of hints 
> as to where in tokenizer.py I should be looking.

If you want to add tokens based on the headers of the message, add
something to tokenize_headers() in tokenizer.py.  Tokens based on the
body, add to tokenize_body().  HTML (etc) stuff, look at the various
Stripper() classes.  For To: addresses, look at the stuff regarding the
"tokenizer":"address_headers" option - line 1151.

=Tony Meyer



More information about the spambayes-dev mailing list