[ python-Bugs-1452246 ] htmllib doesn't properly substitute entities
SourceForge.net
noreply at sourceforge.net
Sat Apr 1 03:13:43 CEST 2006
Bugs item #1452246, was opened at 2006-03-17 03:57
Message generated for change (Comment added) made by rvernica
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1452246&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Helmut Grohne (gnarfk)
Assigned to: Nobody/Anonymous (nobody)
Summary: htmllib doesn't properly substitute entities
Initial Comment:
I'd like to illustrate and suggest a fix by showing a
simple python file (which was named htmllib2.py so you
can uncomment the line in the doctest case to see that
my fix works). It's more like a hack than the fix though:
#!/usr/bin/env python2.4
"""
Use this instead of htmllib for having entitydefs
substituted in attributes,too.
Example:
>>> import htmllib
# >>> import htmllib2 as htmllib
>>> import formatter
>>> import StringIO
>>> s = StringIO.StringIO()
>>> p =
htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter(s)))
>>> p.feed('<img alt="<>&">')
>>> s.getvalue()
'<>&'
"""
__all__ = ("HTMLParser",)
import htmllib
from htmlentitydefs import name2codepoint as entitytable
entitytable = dict([(k, chr(v)) for k, v in
entitytable.items() if v < 256])
def entitysub(s):
ret = ""
state = ""
for c in s:
if state.startswith('&'):
if c == ';':
ret += entitytable.get(state[1:], '%s;'
% state)
state = ""
else:
state += c
elif c == '&':
state = c
else:
ret += c
return ret
class HTMLParser(htmllib.HTMLParser):
def handle_starttag(self, tag, method, attrs):
"""Repair attribute values."""
attrs = [(k, entitysub(v)) for (k, v) in attrs]
method(attrs)
if __name__ == '__main__':
import doctest
doctest.testmod()
----------------------------------------------------------------------
Comment By: Rares Vernica (rvernica)
Date: 2006-03-31 17:13
Message:
Logged In: YES
user_id=1491427
This bug has been fixed on patch #1462498.
Ray
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1452246&group_id=5470
More information about the Python-bugs-list
mailing list