Processing text data with different encodings

Steven D'Aprano steve at pearwood.info
Tue Jun 28 06:29:28 EDT 2016


On Tue, 28 Jun 2016 06:35 pm, Michael Welle wrote:

> my original data is email. The mail header says it's utf-8, but you will
> find three or four different encodings in one email. I think at the
> sending side they just glue different text fragments from different
> sources together without thinking about the encoding.

Is this spam? In my experience, the only email that is that badly
constructed is spam. I can't imagine how it could be email from a person,
coming from a mail client like Thunderbird or Outlook.





-- 
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.



More information about the Python-list mailing list