[Spambayes] defaults vs. chi-square

Tim Peters tim.one@comcast.net
Mon, 14 Oct 2002 17:59:14 -0400


[Tim]
>> An odd thing is that you must have a lot of 'skip:z 70' (etc)
>> tokens in your ham too, else these spamprobs wouldn't be so small.
>>  Any idea where they come from?

[T. Alexander Popiel]
> I'm not sure offhand, either.  I'd have to work to track it down,
> though... and as mentioned earlier, today is a lazy day.  My best
> guess is a few base64 bits that didn't get decoded properly.

I cater to lazy:  you had a bunch of them in the very spams you were talking
about.  What does the source for those look like?  I *used* to get a bunch
of these before we started stripping uuencoded sections, but that shouldn't
be happening anymore -- unless the uuencode-finding regexp is missing a
pattern that's common in your data but not in mine.  Or unless the message
headers are damaged to such an extent that the email package barfs on them
(in which case we fall back to the raw body text).

Whatever the cause, if it's a systematic problem in your data, it will be
for others too.  It may be unique to Perl programmers, though <wink>.