[Spambayes] defaults vs. chi-square

Tim Peters tim.one@comcast.net
Tue, 15 Oct 2002 00:58:28 -0400


[Guido]
> Split it up in lines first, and collect lines that match a simple
> regexp to recognize base64.  Then feed the collected stuff to
> base64.decodestring().  If there's non-white excess, deal with that
> separately.

I'm trying to shame Barry into doing this, since he sucked me into this
project and then vanished <wink>.  More importantly, if he can be provoked
into giving it some real thought, he could do a better job faster than I
could.  For example, I've just got a Message object at this point, and I
don't know beans about whether it's plain, base64-encoded, qp-encoded, or
whatever.  The email pkg knows, though, and Barry knows how to get it to
tell him without even thinking about it.  Since most base64 stuff isn't
damaged, we need smarter recovery code in the "except:" clause of the
snippet I posted.  For a start, if it failed to decode base64 stuff, it
would likely be better to ignore that part entirely than to run off
tokenizing it.  It would be much better still to decode it anyway.