[XML-SIG] Ideas for web/ package
Fred L. Drake, Jr.
fdrake@acm.org
Fri, 15 Feb 2002 13:14:59 -0500
Andrew Kuchling writes:
> As part of the RELAX NG stuff, I've discovered that urlparse() is
> really lenient in its parsing. For example, the fragment value is ''
> if no fragment is supplied, so you can't distinguish between
> http://www.amk.ca and http://www.amk.ca# . Unfortunately this can't
It's not clear that the distinction is meaningful in the RFC, as best
as I can recall (it's been a couple of months since I looked at it).
> really be fixed without changing the API of urlparse() and breaking
> old code.
That's a big issue. I added some new functions in Python 2.2
(urlsplit() and urlunsplit()), but they won't address your concern
about fragments.
> 1) a stricter URL parser, and
You'll have to be more specific about requirements than this! You're
asking for lexical information about the URL rather than logical
information; I'm not sure that's even come up before.
> 2) the skeleton of a Web client that
> handles cookies and caching sensibly (so you could write
> screen-scraping applications on top of it).
This would be *really* nice to have!
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation