[Spambayes] Help Im Lost!
Meyer, Tony
T.A.Meyer at massey.ac.nz
Sat Aug 16 20:59:28 EDT 2003
> I have a question in the tokenizer.py.
>
> If the email contains images(Binary Block):
>
> 1) How does the tokenizer handle it?
> 2) Does the tokenizer, tokenize the Binary block?
> 3) What will the tokenizer do in case of a Binary Block?
> 4) How can it determine that it is a Binary Block?
>
> I appreciate it very much!
The relevant bits in tokenizer.py are those that deal with "octet
parts". (If you just search for "binary" in tokenizer.py, you'll find
the first section).
>From the comments: "there's no point decoding binary blobs (like
images)". You should also read the comments at the start of the
tokenize_body() function. Basically the first few characters of the
octet stream are turned into tokens. The size is controlled by an
option.
Googling for "site:mail.python.org tokenize image spambayes" will bring
up some relevant messages about this from the archives, too.
=Tony Meyer
More information about the Spambayes
mailing list