smart quotes
Peter Otten
__peter__ at web.de
Tue Aug 26 03:13:46 EDT 2008
Adrian Smith wrote:
> Can anyone tell me how to get rid of smart quotes in html using
> Python? I've tried variations on
> stuff = string.replace(stuff, "\“", "\""), but to no avail, presumably
> because they're not standard ASCII.
Convert the string to unicode. For that you have to know its encoding. I
assume UTF-8:
>>> s = "a “smart quote” example"
>>> u = s.decode("utf-8")
Now you can replace the quotes (I looked up the codes in wikipedia):
>>> u.replace(u"\u201c", "").replace(u"\u201d", "")
u'a smart quote example'
Alternatively, if you have many characters to remove translate() is more
efficient:
>>> u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019]))
u'a smart quote example'
If necessary convert the result back to the original encoding:
>>> clean = u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019]))
>>> clean.encode("utf-8")
'a smart quote example'
Peter
More information about the Python-list
mailing list