[XML-SIG] Re: HTML<->UTF-8 'codec'?
Thu, 7 Mar 2002 16:28:37 -0800
I've downloaded Bill Janssen's module to escape UTF8 to HTML and =
vice-versa but I'm a python newbie and I really can't tell how make it =
work. I have some UTF8 data with a bunch of curly quotes that I'd like =
to turn them into html entities and this module seems perfect for it but =
it doesn't do what's expected and if anybody knows how to fix it or if =
there's another option beside writing my own re I'd appreciate it.=20
I should mention that I have=20
in my sitecustomize.py
It seems like I should use the decode function: "Decode takes UTF-8 HTML =
and converts all characters above the ASCII range to HTML character =
entity references." But it appears that the opposite is true.
>>> print 'I’ve had'.decode("html-utf-8")
>>> print 'I’ve had'.decode("html-utf-8").encode("html-utf-8")
Ok... but here's the problem. Using a cut'paste from my Word generated =
utf-8 file into IDLE I get:
>>> print 'I=E2=80=99ve had'.encode("html-utf-8")
Which makes a bunch of garbage in my browser of course.
At first I was thinking there was something wrong with my form of utf-8.
But Notepad and IE6 recognize it as utf-8 and open and display it fine =
and re-saving from notepad to utf-8 format gives the same result.
So I did research on this for a couple of hours and I made this test:
f.write(unicodedata.lookup('RIGHT DOUBLE QUOTATION MARK'))
a =3D f.read()
b =3D a.encode('html-utf-8')
print 'from file'
print 'no file'
print unicodedata.lookup('RIGHT DOUBLE QUOTATION =
Any help is appreciated!