sgmlop: malformed charrefs?
Magnus Lie Hetland
mlh at selje.idi.ntnu.no
Thu Mar 17 10:33:55 EST 2005
In article <mailman.520.1111058896.1799.python-list at python.org>,
Fredrik Lundh wrote:
>Magnus Lie Hetland wrote:
[snip]
>with sgmlop 1.1, the following script
>
>class entity_handler:
> def handle_entityref(self, entityref):
> print "ENTITY", repr(entityref)
>
>parser = sgmlop.XMLParser()
>parser.register(entity_handler())
>parser.feed("&-10;&/()=?;")
>
>prints:
>
>ENTITY '-10'
>ENTITY '/()=?'
OK, thanks. I guess I just wasn't creative enough in my entity naming
:)
>> And another thing... For the case where a numeric reference is too
>> high (i.e. it can't be translated into a Unicode character) -- is it
>> possible to ignore it (or replace it, as with encode/decode)?
>
>if you don't do anything, it is ignored.
>
>if you specify a handle_charref hook, the part between &# and ; is passed
>to that method.
I see -- it's just if the default behaviour of transforming it to text
kicks in that there is trouble? (That makes sense, of course.)
>if you have a handle_entityref hook, but no handle_charref, the part between
>& and ; is passed to handle_entityref.
Strange. It doesn't seem to work that way for me... Here is an example:
......................................................................
from xml.parsers.sgmlop import SGMLParser, XMLParser, XMLUnicodeParser
class Handler:
def handle_data(self, data):
print 'DATA', data
def handle_entityref(self, data):
print 'ENTITY', data
for parser in [SGMLParser(), XMLParser(), XMLUnicodeParser()]:
parser.register(Handler())
try:
parser.feed('�')
except Exception, e:
print e
......................................................................
When I run this, I get:
character reference � exceeds ASCII range
character reference � exceeds ASCII range
character reference � exceeds sys.maxunicode (0xffff)
If I remove the handle_data, nothing happens.
></F>
--
Magnus Lie Hetland Time flies like the wind. Fruit flies
http://hetland.org like bananas. -- Groucho Marx
More information about the Python-list
mailing list