[XML-SIG] unicode entity refs

Lars Marius Garshol larsga@ifi.uio.no
06 May 1999 10:58:13 +0200

* A. M. Kuchling
| (Hmm... I may have written too soon; what's the status of HTML i18n?
| Can you declare a Unicode encoding for an HTML document?)

HTML 4.0 declares the character set to be the first 17 planes of ISO
10646, meaning that references up to 1056768 are allowed, although
there are some holes in between. 

That is, something like this, but in the char ref handler, which I
don't know what should look like:

def unknown_charref(self, ref):
    if ref<9 or ref==11 or ref==12 or (ref>13 and ref<32) or \
       (ref>126 and ref<160) or (ref>55295 and ref<57344):
      raise BadHTMLError, ('Illegal character reference: &#' + ref + ';')
    elif ref<256:
      # accept and insert
      raise BadHTMLError, ('Unsupported character reference: &#' + ref + ';')

I think we should look at the SGML declaration of HTML, outlaw the
characters not supported by HTML 4.0 and raise 
| On a side note, the Unicode issue seems to be heading for using /F's
| Unicode type.  This would seem to be a good argument to drop MvL's
| Unicode type, which is currently in the XML tree, and replace it
| with /F's code.  Opinions?

Go for it!

--Lars M.