[Tutor] Convert XML codes to "normal" text?
Senthil Kumaran
orsenthil at gmail.com
Wed Mar 4 08:01:04 CET 2009
On Wed, Mar 4, 2009 at 11:13 AM, Eric Dorsey <dorseye at gmail.com> wrote:
> I know, for example, that the > code means >, but what I don't know is
> how to convert it in all my data to show properly? I
Feedparser returns the output in html only so except html tags and
entities in the output.
What you want is to Unescape HTML entities (
http://effbot.org/zone/re-sub.htm#unescape-html )
import feedparser
import re, htmlentitydefs
def unescape(text):
def fixup(m):
text = m.group(0)
if text[:2] == "&#":
# character reference
try:
if text[:3] == "&#x":
return unichr(int(text[3:-1], 16))
else:
return unichr(int(text[2:-1]))
except ValueError:
pass
else:
# named entity
try:
text = unichr(htmlentitydefs.name2codepoint[text[1:-1]])
except KeyError:
pass
return text # leave as is
return re.sub("&#?\w+;", fixup, text)
d = feedparser.parse('http://snipt.net/dorseye/feed')
x=0
for i in d['entries']:
print unescape(d['entries'][x].title)
print unescape(d['entries'][x].summary)
print
x+=1
HTH,
Senthil
More information about the Tutor
mailing list