Handling some isolated iso-8859-1 characters
Daniel Mahoney
dan at catfolks.net
Tue Jun 3 14:38:09 EDT 2008
I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.
One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle. =?iso-8859-1?Q?Ana=EFs?="
<not at aol.com> Sun, 21 Nov 2004 16:21:50 -0500
<lmzdkqmqt2fj.54wmpv3zmvvx.dlg at 40tude.net> 4478 69 Xref:
sn-us rec.pets.cats.community:137050
The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Anaïs".
What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?
More information about the Python-list
mailing list