[Spambayes] UnicodeEncodeError raised for bogus Content-Type header

Wed Apr 6 02:05:30 CEST 2005

[Tony Meyer]
>> I've fixed this in CVS, thanks.  The fix is pretty straightforward 
>> (just catch the exception in tokenizer.py), so you could 
>> apply it to your local copy if you like.

[Anthony Baxter]
> If this is a bug in the current email parser (it should never 
> crash) can you please log a bug on SF including an offending 
> message as an attachment to the bug? (In the Python tracker, 
> that is...)

This did occur with 2.4.1, but I'm not sure that it is a bug.  The message
parses and flattens fine; the problem occurs with calling
msg.get_charsets(None) (actually in get_content_charset), which raised the
UnicodeEncodeError (because the charset couldn't be converted to us-ascii).

Are exceptions not supposed to occur even when using these functions?  If
that's the case, then I'll happily submit a (Python) bug with an example
etc.  I can add a patch+test too if someone tells me what the right
behaviour is (add a defect and return the failobj?).

=Tony.Meyer