[XML-SIG] Using PyExpat.py

Uche Ogbuji uche.ogbuji@fourthought.com
Sat, 10 Feb 2001 17:41:24 -0700


> > xml_dom_object = reader.fromUri(filename)  #should work for either URL or file
> 
> Let's talk about this comment.  Is it really a good idea to build URL
> access right into the API here?  For apps that need this, it's trivial
> to write as long as the reader takes an open file object ("stream") as
> an alternative to a filename: just call urllib.urlopen(uri) and pass
> it as the argument.

Yes, but XML's interactions with URI are by no means straightforward.  The 
reason that URIs are built into so many APIs side-by-side with stream APIs 
(and this is the case in all implementations I know of Python or not) is to 
allow a smooth interface to all the URI complications XML brings about, mainly 
the network of rules for luuk-up according to base-URI reolution.  Basically, 
in XML just about everything is a URI.  Some implementations (such as PySAX) 
resolve to local file names merely as a convenience to the user.

And, for instance, there is the matter that URIs are a superset of URL, and 
esoterica such as URNs actually do exist in the XML fairy land.

> Case in point: I found this bit in saxutilx.py:
> 
>         if os.path.isfile(sysid):
>             basehead = os.path.split(os.path.normpath(base))[0]
>             source.setSystemId(os.path.join(basehead, sysid))
>             f = open(sysid, "rb")
>         else:
>             source.setSystemId(urlparse.urljoin(base, sysid))
>             f = urllib.urlopen(source.getSystemId())
> 
> Now I don't know under which circumstances this get triggered (the
> context is obscure), but I'd say it's a bad idea to just try to open a
> URL when a string isn't a local file.  Maybe *you* live in a world
> where the network is "always on" (and I do too!), but for plenty of
> folks, it's rather annoying to find that their modem starts dialing
> out each time they make a typo in a filename.

I think this is a good point in general, but the attitude embodied into many 
XML practices is just this "always on" mentality.  This matter is the subject 
of debate every month or so on XML-DEV.  In fact, there are far more nasty 
implications of XML's URI-happiness than just the modem dialing example.

But I must say: unless urllib is broken, I don't see why this would cause any 
modem dialing in any environment other than Windows, where unfortunately drive 
specifiers look like URL schemes.

And even in windows, why would this cause dialing in a case other than when 
someone has ill-advisedly set up a share drive called http: or ftp:?

> Besides, the syntax for local filenames and URLs is not the same;

I didn't know there was any universal syntax for local filenames.

> the quoting conventions are different and it's quite possible to find that
> the same name could be either a URL or a filename, with vastly
> different interpretations.

I don't see where this is a problem.  If someone wants file "hello\\ world" on 
his local drive, he can just specify it as so, and if someone wants 
"http://spam.com/hello%20world", he can just specify it as so.  If he tries to 
resolve "http://spam.com/hello\\ world", he should get a malformed URL error 
from his user agent or library.

The solution is to use URL quoting if you want a URL, and your local quoting 
convention if you want a local file.

> (See nturl2path.)

Ah.  I don't claim to be able to speak intelligently about Windows NT.

> Without more context,
> it's unclear which syntax should be tried first. The application
> knows this, but the library doesn't.  It's also fine to have an
> alternative API that takes a URL instead of a local filename -- but
> it's not okay to attempt to overlap the two namespaces.

Actually, the library does know.  There is very little about XML that has 
anything to do with file names.  Pretty much everything is a URI.  In most 
cases, the library's trying to resolve a file name first is merely a 
convenience to the user so that he doesn't need to deal with URI arcana for 
local resources, say by type "file:" before every path.  If anything is to be 
done, I'd say this convenience should be taken away.  But I don't see a 
problem big enough to warrant doing so.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python