Reading EmailMessage from file
skip.montanaro at gmail.com
Sun Jul 15 21:31:45 EDT 2018
> What are you actually trying to do? You're talking like you're trying
> to read an existing RFC822 email-with-headers from a file, but you're
> showing code that creates a new email with body content set from
> a file, which is a completely different thing.
Yes, that's exactly what I'm trying to do. A bit more context... I'm
trying to port SpamBayes from Python 2 to Python 3. The file I
attached which failed to come through was exactly what you suggested,
an email in a file. That is what the example from the 3.7 docs
suggested I should be able to do. Had the message in the file been
encoded as utf-8, that would have worked. I just tested it with
another message which is utf-8-encoded.
To Cameron's response suggesting opening the file with
errors="replace", that's not likely to work here. The content in the
message needs to be available to the SpamBayes checkers. Replacing
unrecognized characters with "?" or other replacement characters is
generally not the best course.
Still, Cameron's reply gave me the clue I needed. There is a
BytesParser in the email.parser module:
>>> parser = email.parser.BytesParser()
>>> parser.parse(open("/home/skip/Data/Ham/Set6/754", "rb"))
<email.message.Message object at 0x7f0b684d0518>
That file is utf-8-encoded. Here's the problematic iso-8859-1-encoded file:
>>> parser.parse(open("/home/skip/tmp/79487694", "rb"))
<email.message.Message object at 0x7f0b684d0550>
So, problem solved. The example I originally referred to clearly
requires the caller know the encoding of the input file. When you
don't know the encoding, you need bytes. The BytesParser gave me that.
Also, I must admit to having not completely read the examples page,
where it described use of the BytesParser class a bit further down the
page, but I stopped when the simple example failed for me.
More information about the Python-list