[XML-SIG] XBEL DTD as a meta-dtd

Walter Underwood wunder@infoseek.com
Wed, 16 Sep 1998 09:41:04 -0700

At 05:02 PM 9/16/98 +0200, Marc van Grootel wrote:
>I looked through my bookmark list and there were several url's that
>looked like:
>  http://someserver/somepage.html&var=x
> [...]
>The solution might be to url-encode url's. So the above url
>  http:%3A%2F%2Fsomeserver%2Fsomepage.html%26var%3Dx

Use XML entities. Using two different kinds of escaping (XML and
HTTP) in the same file is unnecessary and confusing.

I've been saving URLs in XML in my product, and entities work
fine. It turns out that you need the entities in other text too,
since someone might use them in a bookmark name ("Arts & Crafts",
"O'Reilly Books"). So just entify them. Here is a snippet of re-hackery
to entify a string:

# This pattern and replacement function are used to map characters
# in a string to XML entities, like this:  entities.sub(entsub,s)
entities = re.compile('[&<>"\']')
def entsub(matchobj):
    c = matchobj.group()
    if   c == '&': return '&amp;'
    elif c == '<': return '&gt;'
    elif c == '>': return '&lt;'
    elif c == "'": return '&apos;'
    elif c == '"': return '&quot;'
    else:          return ''      # logs a message here in my application

Always, always entify strings as you generate XML. If you slip
in an unescaped special character, you can lose the a whole
file worth of data by making it un-parseable (or make someone 
manually edit it to get it back).

Finally, XBEL is doing things that are also done by the Resource
Description Format (RDF). Though the RDF spec is hard to read,
and may fail just because it is drowning in AI-speak rather than 
being useful, it is worth taking a look at.


Walter R. Underwood
wunder@best.com (home)