[Python-bugs-list] [ python-Bugs-516299 ] urlparse can get fragments wrong

Wed, 20 Feb 2002 05:56:46 -0800

Bugs item #516299, was opened at 2002-02-11 20:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: urlparse can get fragments wrong

Initial Comment:
urlparse.urlparse() goes wrong on a URL such as
'http://amk.ca#foo', where there's a fragment
identifier and 
the hostname isn't followed by a slash.  It returns
'amk.ca#foo'
as the hostname portion of the URL.

While looking at that, I realized that test_urlparse()
only tests urljoin(), not urlparse() or urlunparse(). 
The attached patch
also adds a minimal test suite for urlparse(), but it
should
be still more comprehensive.  Unfortunately the RFC
doesn't include test cases, so I haven't done this yet.

(Assigned to you at random, Michael; feel free to
unassign it
if you lack the time.)

----------------------------------------------------------------------

Comment By: Richard Brodie (leogah)
Date: 2002-02-20 05:56

Message:
Logged In: YES 
user_id=356893

The current version of the URI specification (RFC2396) 
includes a regexp for parsing URIs. For evil edge cases, I 
usually cut and paste directly into re.

Would it be an idea just to incorporate it rather than 
hammer the kinks out of the ad-hoc parser? If so, I'll hack 
on it.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-02-13 02:45

Message:
Logged In: YES 
user_id=6656

Sorry, don't know *anything* about URLs and don't really
have the time to learn now...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470