Reading EmailMessage from file
Richard at Damon-family.org
Sun Jul 15 21:45:44 EDT 2018
A raw email message should be treated as a ‘bag of bytes’, and in processing it, the encoding of the various sections determined by headers in the message (or taking defined defaults if not specified). I suspect that means that you should read as a binary file. I would hope that the module has the smarts to detect the encoding/character set in the message, but maybe you need to parse some of the headers yourself to supply that information.
One thing that you do need to watch out for, is there do exist messages in the wind where the declared formatting/encode doesn’t match what the message actually uses.
On Jul 15, 2018, at 9:31 PM, Skip Montanaro <skip.montanaro at gmail.com> wrote:
>> What are you actually trying to do? You're talking like you're trying
>> to read an existing RFC822 email-with-headers from a file, but you're
>> showing code that creates a new email with body content set from
>> a file, which is a completely different thing.
> Yes, that's exactly what I'm trying to do. A bit more context... I'm
> trying to port SpamBayes from Python 2 to Python 3. The file I
> attached which failed to come through was exactly what you suggested,
> an email in a file. That is what the example from the 3.7 docs
> suggested I should be able to do. Had the message in the file been
> encoded as utf-8, that would have worked. I just tested it with
> another message which is utf-8-encoded.
> To Cameron's response suggesting opening the file with
> errors="replace", that's not likely to work here. The content in the
> message needs to be available to the SpamBayes checkers. Replacing
> unrecognized characters with "?" or other replacement characters is
> generally not the best course.
> Still, Cameron's reply gave me the clue I needed. There is a
> BytesParser in the email.parser module:
>>>> parser = email.parser.BytesParser()
>>>> parser.parse(open("/home/skip/Data/Ham/Set6/754", "rb"))
> <email.message.Message object at 0x7f0b684d0518>
> That file is utf-8-encoded. Here's the problematic iso-8859-1-encoded file:
>>>> parser.parse(open("/home/skip/tmp/79487694", "rb"))
> <email.message.Message object at 0x7f0b684d0550>
> So, problem solved. The example I originally referred to clearly
> requires the caller know the encoding of the input file. When you
> don't know the encoding, you need bytes. The BytesParser gave me that.
> Also, I must admit to having not completely read the examples page,
> where it described use of the BytesParser class a bit further down the
> page, but I stopped when the simple example failed for me.
More information about the Python-list