Processing text data with different encodings
steve at pearwood.info
Tue Jun 28 06:29:28 EDT 2016
On Tue, 28 Jun 2016 06:35 pm, Michael Welle wrote:
> my original data is email. The mail header says it's utf-8, but you will
> find three or four different encodings in one email. I think at the
> sending side they just glue different text fragments from different
> sources together without thinking about the encoding.
Is this spam? In my experience, the only email that is that badly
constructed is spam. I can't imagine how it could be email from a person,
coming from a mail client like Thunderbird or Outlook.
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
More information about the Python-list