[XML-SIG] Using PyExpat.py

Uche Ogbuji uche.ogbuji@fourthought.com
Mon, 19 Feb 2001 15:48:04 -0700

> > > I'd like to drop support for URLs; I don't think the typical computer
> > > is sufficiently networked to make this work well.
> > 
> > In this case, the typical computer user will have a great deal of trouble 
> > using any XML application in any language.  Almost all of them use URIs as 
> > basis, and for good reason.  Special support for local files are almost 
> > universally a mere convenience.
> > 
> > Most XML processing specifications mandate that the URI of the XML
> > entity that contains an infoset node is used as the basis for
> > further processing.  To me, this argues strongly for dropping local
> > files rather than URIs if we must choose.  Some XML specs would be
> > very difficult to implement properly if the low-level tools became
> > file-system-only readers.
> Can you give more details of how this is used?  I've got very limited
> XML experience, and so far it all falls in the category of "here's a
> file; give me a DOM tree for it" or "here's a DOM tree, write it to a
> file".  There are no URLs anywhere.  Sometimes instead of a file it'll
> be text data read from or written to a database.  But no URLs.


Basically, it's what you do with the DOM, and especially how attributes, 
system identifiers and other such creatures are interpreted.

Basically, parseFile or parseUri in a top-level URI is typically only a small 
cross-section of the usage pattern in any XML processor.  Other functions such 
as Stylesheet processing, XIncludes, xml:base, RDF, and pretty much anything 
else, gets these strings and are *required* to interpret these as URIs.

If they were originally interpreted purely as files, then all the points of 
confusion you pointed out are immediately compounded as the system tries to 
reconcile the relative URIs against the "base URI" which is actually a file 
system file.

This is actually a problem that I have seen people run into far more often 
than any worries about computers not having network connections.  I'be been 
sorely tempted to remove file support just because it eliminates confusion 
with the large body of XML processing that requires relative URI normalization 
and resolution.

> > The Mac people should have spoken to the IETF a decade ago when URLs
> > emerged, or a bit later when URIs came out.  I suspect, again that
> > if this is the case, they suffer much more pain in XML processing
> > than is inflicted on them by PyXML.
> That's a pretty intolerant attitude you're displaying there.  They
> need not suffer at all if at all times it is clear whether a name is a
> URL or a filename.  It's trying to fold the two namespaces into one
> that I'm fighting here.

Not my intention.  My point is that I can't imagine PyXML is an outstanding 
problem for XML developers on a platform that uses colons as path separators.

It's a purely technical argument.  I don't know a thing about the Mac.

> > > I would suggest to have separate APIs depending on the argument type,
> > > e.g. p.parseFile(filename), p.parseURL(url),
> > > p.parseStream(InputSource), p.parseString(text).  (And no, Java
> > > overloading wouldn't help much here, since three out of four APIs have
> > > string arguments.)
> > 
> > Sure, one can add a parseFile, but what do you do with
> > 
> > <?xml version='1.0'?>
> > <!DOCTYPE spam [
> >   <!ENTITY foo SYSTEM 'foo.bar'>
> > ]>
> > <spam>&foo;</spam>
> > 
> > URI or file?
> > 
> > Note that this is a trick question, and the "trick" is *exactly* my point.
> So explain the trick.  I don't know enough XML to understand what it
> means.  I don't even know which thing you are asking about!  spam?
> foo?  foo.bar?  &foo;?

"foo.bar".  I think I explained it better in my succeeding message.

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python