[Spambayes] It gets funnier all the time....

Wed Feb 12 11:00:13 EST 2003

2/12/2003 10:56:16 AM, Skip Montanaro <skip at pobox.com> wrote:

>>>>>> "Tim" == Tim Stone <- Four Stones Expressions 
<tim at fourstonesExpressions.com>> writes:
>
>    Tim> 2/12/2003 8:55:59 AM, Neil Schemenauer <nas at python.ca> wrote:
>    >> Rob W.W. Hooft wrote:
>    >>> But that means that if we wan't to be able to use the clues in
>    >>> spambayes, we either have to make a token base64-encoding-missing or
>    >>> we have to decode it to get the clues from the body.
>    >> 
>    >> Generating a clue sounds best, assuming SB doesn't nail it already.
>
>    Tim> I doubt that the tokenizer would generate any meaningful tokens
>    Tim> from this message.  Generating a token would be the right way to do
>    Tim> it, any ideas how?
>
>Sure, generate a "no explicit content-transfer-encoding" token.  Alas, most
>mail messages are written with
>
>    Content-Type: text/plain; charset="us-ascii"
>
>and don't contain a Content-Transfer-Encoding header,

Right.

> so all by itself it
>probably wouldn't be a very useful clue.  The tokenizer does have access to
>the entire message though, so it could conceivably guess at encodings if no
>CTE header was given and the first line of the message body was long
>(suggesting base-64) or looked like the start of a uuencode block.

I suppose we could have a 'first_line_max_length' option that would trigger a 
base-64 decode of the first line, followed by a check for 'printable' 
characters in the result... seem reasonable?  - TimS

>
>Skip
>
>

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org