sgmlop: malformed charrefs?

Magnus Lie Hetland mlh at selje.idi.ntnu.no
Thu Mar 17 10:33:55 EST 2005


In article <mailman.520.1111058896.1799.python-list at python.org>,
Fredrik Lundh wrote:
>Magnus Lie Hetland wrote:
[snip]
>with sgmlop 1.1, the following script
>
>class entity_handler:
>    def handle_entityref(self, entityref):
>        print "ENTITY", repr(entityref)
>
>parser = sgmlop.XMLParser()
>parser.register(entity_handler())
>parser.feed("&-10;&/()=?;")
>
>prints:
>
>ENTITY '-10'
>ENTITY '/()=?'

OK, thanks. I guess I just wasn't creative enough in my entity naming
:)

>> And another thing... For the case where a numeric reference is too
>> high (i.e. it can't be translated into a Unicode character) -- is it
>> possible to ignore it (or replace it, as with encode/decode)?
>
>if you don't do anything, it is ignored.
>
>if you specify a handle_charref hook, the part between &# and ; is passed
>to that method.

I see -- it's just if the default behaviour of transforming it to text
kicks in that there is trouble? (That makes sense, of course.)

>if you have a handle_entityref hook, but no handle_charref, the part between
>& and ; is passed to handle_entityref.

Strange. It doesn't seem to work that way for me... Here is an example:

......................................................................
from xml.parsers.sgmlop import SGMLParser, XMLParser, XMLUnicodeParser

class Handler:

    def handle_data(self, data):
        print 'DATA', data

    def handle_entityref(self, data):
        print 'ENTITY', data

for parser in [SGMLParser(), XMLParser(), XMLUnicodeParser()]:
    parser.register(Handler())
    try:
        parser.feed('�')
    except Exception, e:
        print e
......................................................................

When I run this, I get:

character reference &#x540be3ff; exceeds ASCII range
character reference &#x540be3ff; exceeds ASCII range
character reference &#x540be3ff; exceeds sys.maxunicode (0xffff)

If I remove the handle_data, nothing happens.

></F> 

-- 
Magnus Lie Hetland               Time flies like the wind. Fruit flies
http://hetland.org               like bananas.         -- Groucho Marx



More information about the Python-list mailing list