[XML-SIG] sgmlop and html parsing
"Martin v. Löwis"
martin at v.loewis.de
Wed Jan 14 14:03:35 EST 2004
Walter Dörwald wrote:
> Wouldn't it make sense to implement an SGMLParser that supports
> unicode?
No. In SGML, the SGML declaration defines the document encoding, e.g.
CHARSET
BASESET
"ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5
4/0"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
BASESET
"ISO Registration Number 100//CHARSET ECMA-94 Right Part of Latin
Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET
128 32 UNUSED
160 96 32
So to understand a character reference, you have to know the SGML
declaration. It is Unicode only if the declaration says
CHARSET
BASESET
"ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation
level 3//ESC 2/5 2/15 4/6"
Regards,
Martin
More information about the XML-SIG
mailing list