[Spambayes] It gets funnier all the time....

Skip Montanaro skip at pobox.com
Wed Feb 12 10:56:16 EST 2003

>>>>> "Tim" == Tim Stone <- Four Stones Expressions <tim at fourstonesExpressions.com>> writes:

    Tim> 2/12/2003 8:55:59 AM, Neil Schemenauer <nas at python.ca> wrote:
    >> Rob W.W. Hooft wrote:
    >>> But that means that if we wan't to be able to use the clues in
    >>> spambayes, we either have to make a token base64-encoding-missing or
    >>> we have to decode it to get the clues from the body.
    >> Generating a clue sounds best, assuming SB doesn't nail it already.

    Tim> I doubt that the tokenizer would generate any meaningful tokens
    Tim> from this message.  Generating a token would be the right way to do
    Tim> it, any ideas how?

Sure, generate a "no explicit content-transfer-encoding" token.  Alas, most
mail messages are written with

    Content-Type: text/plain; charset="us-ascii"

and don't contain a Content-Transfer-Encoding header, so all by itself it
probably wouldn't be a very useful clue.  The tokenizer does have access to
the entire message though, so it could conceivably guess at encodings if no
CTE header was given and the first line of the message body was long
(suggesting base-64) or looked like the start of a uuencode block.


