[XML-SIG] Another Sax2 Enhancement: dataSource

W. Eliot Kimber eliot@isogen.com
Fri, 27 Apr 2001 13:28:01 -0500


"Martin v. Loewis" wrote:
> 
> > I need to be able to get from a DOM the filename or URI of the file it
> > was constructed from (if it was constructed from a file). I didn't see
> > anything in the code that preserved this data, so I've hacked my local
> > copy of the code to add an optional "dataSource" parameter to the
> > FromFile/URI/Stream methods that propogates to the Sax reader. This is
> > set as the friend property _dataSource on the resulting DOM node.
> >
> > Is this an appropriate solution?
> 
> I would not think so. Are you sure you need this on every node?
> That seems to be quite expensive for a rarely-used extension.

It's only set on the document node, not every node in the DOM. I think
my comment above was misleading.
 
> If you really only put the parameter to FromFile into the tree, isn't
> putting it into the document sufficient?
> 
> But then, aren't you interested in the data sources of the elements
> originating from an external parsed entity?

Since I don't recognize the useful existence of external parsed
entities, I would never need to know anything about them :-)

Possibly, but since an external parsed entity would be relative to the
document that declares it, having the document's path and the entity's
relative path would be sufficient (as it is for unparsed entities).
 
> > I need this because uparsed entity resolution does not preserve the
> > absolute path of the entities, only the system ID as specified in the
> > declaration. As I am using relative paths, I must have the path of the
> > declaring document in order to be able to construct new DOMs from
> > referenced XML document entities.
> 
> I'm not sure I understand. Are you saying that the DOM requires to
> store the relative path in Entity::systemId? Where does the spec say
> so?

I don't think it explicitly requires it, but I would be very upset if a
DOM parser changed the value of the original system ID with no way to
get it back. I would consider that to be agregious destruction of
data--what if the system ID is a URN that I need to be able to
interrogate after the entity is resolved or what if I want to rewrite
the entity declarations as originally specified?

Thus I would not, personally suggest that the system ID value be the
fully-resolved location of the file.
 
> > My business problem is processing a hyperdocument consisting of many
> > subordinate documents, where each document is declared as an unparsed
> > entity with a relative path (this hyperdocument is generated by an
> > automatic process and the intent is for the resulting package of
> > documents to be self contained so that it can be packaged and moved
> > without the need to rework any external identifiers or catalog files).
> 
> To solve this problem, isn't it sufficient to carry the document's
> system ID along with the document, instead of putting it *into* the
> document?

I could, but then I'm requiring processing applications to do this, at
some significant cost in complexity. For example, in my code, I may be
receiving a node from any document in the hyperdocument. Thus I would
have to maintain a global mapping from document nodes to system IDs.
Doable, but why force all applications to do this when I should be able
to just ask the DOM "where did you come from?"

Given that the info is only stored on the document node, I think it's a
relatively small cost. It was certainly easier for me to patch the DOM
implementation then to implement my own dictionary (although not that
much easier). But now nobody else has to think about it.

Cheers,

E.

-- 
. . . . . . . . . . . . . . . . . . . . . . . .

W. Eliot Kimber | Lead Brain

1016 La Posada Dr. | Suite 240 | Austin TX  78752
    T 512.656.4139 |  F 512.419.1860 | eliot@isogen.com

w w w . d a t a c h a n n e l . c o m