[XML-SIG] (Py)DOM: Character References
Carsten Oberscheid
co@daisybytes.su.uunet.de
Fri, 19 Mar 1999 10:52:18 +0100
>
> * Carsten Oberscheid
> |
> | Ok, since charrefs encode only characters from the document's base
> | character set (Unicode for XML, ASCII for SGML -- is that right?)
>
> No. XML uses Unicode, but since XML is SGML (an SGML application
> profile, to be correct), it follows that this isn't true. And in fact
> SGML as a meta-language does not have a fixed document character set.
> In fact, the SGML declaration allows you to define your own character
> set in terms of well-known character sets.
Allright, I should have said "SGML according to the standard declaration a.k.a.
reference concrete syntax" ;^)
>
> So, SGML can use Unicode/ISO 10646, as for example HTML 4.0 does[1],
> but it can also use any other character set which consists of
> well-known characters. It also has standard ways of handling
> characters that are not in the character sets. However, I don't think
> it can handle every character encoding, but I might be wrong.
But that leads be back to my original train of thought. Guess I'm processing
SGML/XML/HTMLx.x documents on a system that can't cope with the documents' full
character set, e.g. it can display ASCII only. Since the source and the target
systems are not limited that way, I don't want to restrict the character set
itself. I just want, in my intermediate processing, to consequently represent
the non-ASCII characters as character references.
As far as I can see from my zen level (I'm down hee-eeere!!), the DOM doesn't
know about charrefs, and PyDOM expects them to be resolved (which xmlproc, for
example, silently does). All I can do is to tell the XML lineariser to
translate certain characters back to charrefs on output. But as I type this
(learning by chatting away, hope you don't mind...) I see that this should be
ok, since, to be XML (or SGML) conformant, my system (and the DOM
implementation and the parser and so on) MUST be able to cope with the full
charset internally.
Hope I got this right now in my small brain, and thanks for making me think
about it again.
>
> --Lars M.
.co.
+------------------------------------------------------- daisy bytes! --------+
Carsten Oberscheid
co@daisybytes.su.uunet.de digital document processing
http://www.pweb.de/daisybytes.su electronic publishing