Unicode chr(150) en dash

marexposed at googlemail.com marexposed at googlemail.com
Fri Apr 18 05:28:56 EDT 2008


On Thu, 17 Apr 2008 20:57:21 -0700 (PDT)
hdante <hdante at gmail.com> wrote:

>  Don't use old 8-bit encodings. Use UTF-8.

Yes, I'll try. But is a problem when I only want to read, not that I'm trying to write or create the content.
To blame I suppose is Microsoft's commercial success. They won't adhere to standars if that doesn't make sense for their business.

I'll change the approach trying to filter the contents with htmllib and mapping on my own those troubling characters.
Anyway this has been a very instructive dive into unicode for me, I've got things cleared up now.

Thanks to everyone for the great help.



More information about the Python-list mailing list