[XML-SIG] Using PyExpat.py

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Mon, 19 Feb 2001 22:11:47 +0100

> > xml_dom_object = reader.fromUri(filename) #should work for either
> > URL or file

> Let's talk about this comment.  Is it really a good idea to build URL
> access right into the API here?

I can't find out whether this has been settled. Did you propose to
drop the support for URLs in the API, or the one for local files.

We just had a report where urllib apparently decided to use "c" as the
protocol name; I'm not entirely sure what the exact cause was.

> Case in point: I found this bit in saxutilx.py:
>         if os.path.isfile(sysid):
>             basehead = os.path.split(os.path.normpath(base))[0]
>             source.setSystemId(os.path.join(basehead, sysid))
>             f = open(sysid, "rb")
>         else:
>             source.setSystemId(urlparse.urljoin(base, sysid))
>             f = urllib.urlopen(source.getSystemId())
> Now I don't know under which circumstances this get triggered (the
> context is obscure)

prepare_input_source is invoked by every parser when processing the
argument to .parse(), so the common usage is

  p = make_parser()

Instead of filename, you can have URLs, stream, and InputSource
objects (the Java API only supports InputSource here).

> but I'd say it's a bad idea to just try to open a URL when a string
> isn't a local file.  Maybe *you* live in a world where the network
> is "always on" (and I do too!), but for plenty of folks, it's rather
> annoying to find that their modem starts dialing out each time they
> make a typo in a filename.

But would the modem actually start dialling? Wouldn't it rather
determine that the protocol is "file" and the report that the file is
missing? So I think it would either report an unknown url type, or an
ENOENT. What kind of typo did you think of?

> The application knows this, but the library doesn't.  It's also fine
> to have an alternative API that takes a URL instead of a local
> filename -- but it's not okay to attempt to overlap the two
> namespaces.

The application can always make sure that the right thing is processed
by opening it itself, and then passing that to the parser.