sgmlop: malformed charrefs?
Magnus Lie Hetland
mlh at selje.idi.ntnu.no
Thu Mar 17 11:48:39 CET 2005
According to The Sgmlop Module Handbook , the handle_entityref()
callback is called for "malformed character entities". What does that
mean, exactly? What is a malformed character entity? I've tried
mis-spelling them (e.g., dropping the semicolon), but then they're
(quite naturally) treated as text/data, with handle_data(). I've tried
to use number that is too great, or (equivalently, it turns out) to
use names instead of numbers, such as &#foo;. In these cases, I only
get an exception, because the number is too high...
So -- how can I produce a malformed character entity? I've tried to
read the C code, but I can't say that left me any wiser on the
subject; it doesn't seem to have any special-casing for this that I
And another thing... For the case where a numeric reference is too
high (i.e. it can't be translated into a Unicode character) -- is it
possible to ignore it (or replace it, as with encode/decode)? I'm
trying to write a parser that will accept *any* input text without
complaining -- but simply trapping this exception would seem to
disrupt the parsing process...
Magnus Lie Hetland Time flies like the wind. Fruit flies
http://hetland.org like bananas. -- Groucho Marx
More information about the Python-list