[Python-bugs-list] [ python-Bugs-516299 ] urlparse can get fragments wrong
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 20 Feb 2002 05:56:46 -0800
Bugs item #516299, was opened at 2002-02-11 20:10
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: urlparse can get fragments wrong
Initial Comment:
urlparse.urlparse() goes wrong on a URL such as
'http://amk.ca#foo', where there's a fragment
identifier and
the hostname isn't followed by a slash. It returns
'amk.ca#foo'
as the hostname portion of the URL.
While looking at that, I realized that test_urlparse()
only tests urljoin(), not urlparse() or urlunparse().
The attached patch
also adds a minimal test suite for urlparse(), but it
should
be still more comprehensive. Unfortunately the RFC
doesn't include test cases, so I haven't done this yet.
(Assigned to you at random, Michael; feel free to
unassign it
if you lack the time.)
----------------------------------------------------------------------
Comment By: Richard Brodie (leogah)
Date: 2002-02-20 05:56
Message:
Logged In: YES
user_id=356893
The current version of the URI specification (RFC2396)
includes a regexp for parsing URIs. For evil edge cases, I
usually cut and paste directly into re.
Would it be an idea just to incorporate it rather than
hammer the kinks out of the ad-hoc parser? If so, I'll hack
on it.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2002-02-13 02:45
Message:
Logged In: YES
user_id=6656
Sorry, don't know *anything* about URLs and don't really
have the time to learn now...
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470