[XML-SIG] 2 Qs: encoding & entities with xmlproc

Lars Marius Garshol larsga@ifi.uio.no
08 Jun 1999 12:40:09 +0200


* Dan Libby
| 
| [charconv.py]
|
| I would appreciate that.  (Consider this 'demand') 

OK. This is a very simple change, and I've written the code before, so
I should be able to do this in a couple of days (am very busy at the
moment, and only write email while waiting for compiles and such).

| If it is a simple change, perhaps you can just send us a diff or
| something?

You'll get a ZIP file with 0.61.1 in it. (Easier, I think.)
 
* Lars Marius Garshol
|
| Hmmm. The cleanest solution to this (from an XML/SGML point of view)
| is probably to use string.replace to escape all '<'s in character
| data when it is passed to you from the parser. That would also let
| you retain parser independence and is cleaner in the sense that it
| becomes more obvious what you're really doing.
 
* Dan Libby
|
| Yes, that is actually the solution I came up with also.  It doesn't
| really seem that clean to me, because if there is a character above
| 127 that we want to replace with an entity, it gets funny depending
| on which encoding is in use. 

Well, you control the encoding (after it's gone through xmlproc), so
this shouldn't be a problem.

| Whereas in the old model, we simply had a map from eg "180" to
| "&#180;" that we returned to the parser and similarly things like
| "quot" to "&amp;quot;".

What you're doing here is letting code control the interpretation of
the document, which isn't really all that clean. With and without
custom code the document would be different when parsed.

Simply remapping characters in the output is IMHO a lot cleaner in
that the separation between code and document is clear.
 
| I tried doing this with entity declarations in the DTD and xmlproc
| just for kicks.  It would allow it for character based entity names,
| but didn't allow any names starting with a numeric.

This is because &#60; is not an entity reference, it's a direct
reference to the Unicode character U+0074, and so it's no wonder that
you're not allowed to define such an entity.

| I like being parser independent.  ;-)

Good! It bothers me that most people seem to prefer being chained to
whatever product they're using (parser, database, whatever).
 
* Lars Marius Garshol
|
| Also: do you need an option to disallow element and attribute
| declarations in the internal subset?
 
* Dan Libby
|
| Sorry, I'm not sure what this means.  What is the internal subset?

Here's an example:

<!DOCTYPE rdf:RDF PUBLIC "..." "..." [
  <!-- The line above references the external subset, while what
       appears between the [ and the ]> is the internal subset -->

  <!ELEMENT channel (fiskepudding, kumle, lakrisbåter)>
]>

<rdf:RDF>
  <channel>
    <fiskepudding>Vondt!</fiskepudding>
    <kumle>OK!</kumle>
    <lakrisbåter>Godt!</lakrisbåter>
  </channel>

  ...
</rdf:RDF>
 

xmlproc and all other validating parsers would let this pass with no
complaints at all. I suppose you may not want that.

Oh, and BTW, can I list My Netscape on the xmlproc page as xmlproc
users? (I'm sure you're desperate for the extra hits. :)

--Lars M.