[Python-bugs-list] [ python-Bugs-516299 ] urlparse can get fragments wrong

Mon, 18 Mar 2002 04:36:58 -0800

Bugs item #516299, was opened at 2002-02-12 04:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470

Category: Python Library
>Group: Python 2.2.1 candidate
Status: Open
Resolution: None
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
>Assigned to: Michael Hudson (mwh)
Summary: urlparse can get fragments wrong

Initial Comment:
urlparse.urlparse() goes wrong on a URL such as
'http://amk.ca#foo', where there's a fragment
identifier and 
the hostname isn't followed by a slash.  It returns
'amk.ca#foo'
as the hostname portion of the URL.

While looking at that, I realized that test_urlparse()
only tests urljoin(), not urlparse() or urlunparse(). 
The attached patch
also adds a minimal test suite for urlparse(), but it
should
be still more comprehensive.  Unfortunately the RFC
doesn't include test cases, so I haven't done this yet.

(Assigned to you at random, Michael; feel free to
unassign it
if you lack the time.)

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-18 12:36

Message:
Logged In: YES 
user_id=6656

I'll get to this in a minute.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-15 13:34

Message:
Logged In: YES 
user_id=11375

Oops, sorry.  Revised version of the patch attached, that 
just adds the diffs for test_urlparse.

This would be a 2.2.1 candidate, assuming my fix is 
otherwise correct.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-15 10:03

Message:
Logged In: YES 
user_id=6656

Well, make test now says this:

test test_urlparse produced unexpected output:
**********************************************************************
*** lines 2-6 of actual output doesn't appear in expected
output after line 1:
+ http://www.python.org = ('http', 'www.python.org', '', '',
'', '')
+ http://www.python.org#abc = ('http', 'www.python.org', '',
'', '', 'abc')
+ http://www.python.org/#abc = ('http', 'www.python.org',
'/', '', '', 'abc')
+ http://a/b/c/d;p?q#f = ('http', 'a', '/b/c/d', 'p', 'q', 'f')
+
**********************************************************************

did you just forget to update output/test_urlparse?

Is this a 2.2.1 candidate?

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-14 17:52

Message:
Logged In: YES 
user_id=11375

Unassigning -- anyone want to review my bug fix so I can check it 
in?

(leogah's idea of using the regex from RFC2396 is a good one, but 
that 
large a change should probably go into 2.3, not a .1 release.)

----------------------------------------------------------------------

Comment By: Richard Brodie (leogah)
Date: 2002-02-20 13:56

Message:
Logged In: YES 
user_id=356893

The current version of the URI specification (RFC2396) 
includes a regexp for parsing URIs. For evil edge cases, I 
usually cut and paste directly into re.

Would it be an idea just to incorporate it rather than 
hammer the kinks out of the ad-hoc parser? If so, I'll hack 
on it.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-02-13 10:45

Message:
Logged In: YES 
user_id=6656

Sorry, don't know *anything* about URLs and don't really
have the time to learn now...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470