[XML-SIG] XBEL DTD as a meta-dtd
Marc van Grootel
bwaumg@urc.tue.nl
Wed, 16 Sep 1998 17:02:50 +0200
Hi,
So the consensus is, more or less, that 'less is more'. I can agree
to that. My experiment with architectural forms may have led me to far
from the goals of XBEL. I agree with Greg that such URL extraction is
better left to another DTD.
So the scope is 'hierarchical storage for bookmarks'?. But is
a lossless conversion between XBEL and Netscape still a goal? If not
I think that 'separator' should go since it serves no real purpose.
Even with something that looks so simple there are some important
issues which show up mostly after people start implementing
applications with it. In my opinion we need an escape-hatch to provide
for some of these yet unknown applications. This is what I had in mind
with:
<info>
<meta name=".." content="..">
...
</info>
Or as Fred suggested:
<info>
<meta name="..">...</meta>
...
</info>
Greg Stein wrote:
> This is kind of silly. XML is intended to encode the "name" as the
> actual tag. Why push this down another level? Using an "owner" tag, you
> can extract this information directly from the parse tree. Using a
> "meta" tag like above, now the software has to iterate through the meta
> tags looking for the information.
>
> XML is enough of an abstraction; you don't want to start creating
> additional layers in there. The tendency should be towards additional
> tags and less "control" type elements. It does not hurt anything to
> specify an optional tag, yet it can make many things easier.
I think it can extend the life-time of the DTD. Maybe then at a later
stage common conventions could make it into the DTD as an explicit
element. This situation is better then defining only a few explicit
elements for info which can lead to tag-abuse by different authors and
applications. These catch-all mechanisms are not uncommon and I don't
think they violate the idea of XML. I rather like one well-crafted DTD
then having multiple DTD's with only minor differences.
If info like 'owner' is so important that it should be declared
explicitly it can also be an (optional) attribute of the elements to
which it belongs (folder and bookmark).
As to the form of the meta element:
Maybe the 'name' attribute should be declared as NMTOKEN to restrict
it to a name token. With <meta name="..">my data</meta> the content is
#PCDATA so if there are certain characters in the data they should be
encoded ('<' => '<' etc.). For a 'content' attribute things like '<'
and '>' can stay as they are (but watch out for '&' -- see below).
Where to put the URL's?
Although it may seem like nitpicking I think it is not.
One of the reasons for putting the url itself in an attribute would be
the stricter constraints of CDATA and being able to make it
#REQUIRED. As element content the parser cannot check if the element
really contains a value at all since:
<url></url> will look ok to the parser.
There's another reason though.
I looked through my bookmark list and there were several url's that
looked like:
http://someserver/somepage.html&var=x
A parser will complain when it sees this since '&' preceding a
name-character starts a general entity reference. Which is
probably not defined. Then it encounters the '=' which generates
a warning since a general entity should end with ';'.
I thought it would be safe to put the url in a CDATA attribute. Alas,
it turns out that even in a CDATA attribute a parser would still try
to resolve a general entity. In David Megginson's book (Structuring
XML Documents - p. 19) I found the following explanation:
CDATA attribute type:
Note that an attribute type applies to the value of the attribute
*after* the attribute string has been normalized - general entities
will still be recognized as part of that normalization process.
So, although I thought putting url's in a CDATA attribute is safe, it
is not.
The solution might be to url-encode url's. So the above url
becomes:
http:%3A%2F%2Fsomeserver%2Fsomepage.html%26var%3Dx
Hmmm. Not a pretty sight.
Maybe a structure like:
<bookmark id=".." href=".." visited=".." ...>
<title>..</title>
<desc>..</desc>
</bookmark>
is not so bad (maybe even with an optional info element?).
Finally, what about the main level? Forest or Tree?
<xbel>
<folder>..</folder>
<folder>..</folder>
<bookmark>..</bookmark>
</xbel>
Or:
<xbel>
<folder>
<folder>..</folder>
<bookmark>..</bookmark>
</folder>
</xbel>
I like Fred's suggestion that in the latter an info element directly
under xbel (so outside a folder) could convey other info then the info
elements inside a folder (or maybe even a bookmark). Maybe this even
warrants naming that specific element differently ('header'?).
Do we have to fix a limit for the depth of recursion or should this be
left to every application. Maybe we should say that an XBEL
application should at least be able to handle a depth of x folders.
Marc
---
Marc van Grootel
bwaumg@urc.tue.nl