[Spambayes] It gets funnier all the time....
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Wed Feb 12 11:00:13 EST 2003
2/12/2003 10:56:16 AM, Skip Montanaro <skip at pobox.com> wrote:
>>>>>> "Tim" == Tim Stone <- Four Stones Expressions
<tim at fourstonesExpressions.com>> writes:
> Tim> 2/12/2003 8:55:59 AM, Neil Schemenauer <nas at python.ca> wrote:
> >> Rob W.W. Hooft wrote:
> >>> But that means that if we wan't to be able to use the clues in
> >>> spambayes, we either have to make a token base64-encoding-missing or
> >>> we have to decode it to get the clues from the body.
> >> Generating a clue sounds best, assuming SB doesn't nail it already.
> Tim> I doubt that the tokenizer would generate any meaningful tokens
> Tim> from this message. Generating a token would be the right way to do
> Tim> it, any ideas how?
>Sure, generate a "no explicit content-transfer-encoding" token. Alas, most
>mail messages are written with
> Content-Type: text/plain; charset="us-ascii"
>and don't contain a Content-Transfer-Encoding header,
> so all by itself it
>probably wouldn't be a very useful clue. The tokenizer does have access to
>the entire message though, so it could conceivably guess at encodings if no
>CTE header was given and the first line of the message body was long
>(suggesting base-64) or looked like the start of a uuencode block.
I suppose we could have a 'first_line_max_length' option that would trigger a
base-64 decode of the first line, followed by a check for 'printable'
characters in the result... seem reasonable? - TimS
c'est moi - TimS
More information about the Spambayes