[Python-bugs-list] [ python-Bugs-516299 ] urlparse can get fragments wrong
noreply@sourceforge.net
noreply@sourceforge.net
Mon, 18 Mar 2002 04:36:58 -0800
Bugs item #516299, was opened at 2002-02-12 04:10
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470
Category: Python Library
>Group: Python 2.2.1 candidate
Status: Open
Resolution: None
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
>Assigned to: Michael Hudson (mwh)
Summary: urlparse can get fragments wrong
Initial Comment:
urlparse.urlparse() goes wrong on a URL such as
'http://amk.ca#foo', where there's a fragment
identifier and
the hostname isn't followed by a slash. It returns
'amk.ca#foo'
as the hostname portion of the URL.
While looking at that, I realized that test_urlparse()
only tests urljoin(), not urlparse() or urlunparse().
The attached patch
also adds a minimal test suite for urlparse(), but it
should
be still more comprehensive. Unfortunately the RFC
doesn't include test cases, so I haven't done this yet.
(Assigned to you at random, Michael; feel free to
unassign it
if you lack the time.)
----------------------------------------------------------------------
>Comment By: Michael Hudson (mwh)
Date: 2002-03-18 12:36
Message:
Logged In: YES
user_id=6656
I'll get to this in a minute.
----------------------------------------------------------------------
Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-15 13:34
Message:
Logged In: YES
user_id=11375
Oops, sorry. Revised version of the patch attached, that
just adds the diffs for test_urlparse.
This would be a 2.2.1 candidate, assuming my fix is
otherwise correct.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2002-03-15 10:03
Message:
Logged In: YES
user_id=6656
Well, make test now says this:
test test_urlparse produced unexpected output:
**********************************************************************
*** lines 2-6 of actual output doesn't appear in expected
output after line 1:
+ http://www.python.org = ('http', 'www.python.org', '', '',
'', '')
+ http://www.python.org#abc = ('http', 'www.python.org', '',
'', '', 'abc')
+ http://www.python.org/#abc = ('http', 'www.python.org',
'/', '', '', 'abc')
+ http://a/b/c/d;p?q#f = ('http', 'a', '/b/c/d', 'p', 'q', 'f')
+
**********************************************************************
did you just forget to update output/test_urlparse?
Is this a 2.2.1 candidate?
----------------------------------------------------------------------
Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-14 17:52
Message:
Logged In: YES
user_id=11375
Unassigning -- anyone want to review my bug fix so I can check it
in?
(leogah's idea of using the regex from RFC2396 is a good one, but
that
large a change should probably go into 2.3, not a .1 release.)
----------------------------------------------------------------------
Comment By: Richard Brodie (leogah)
Date: 2002-02-20 13:56
Message:
Logged In: YES
user_id=356893
The current version of the URI specification (RFC2396)
includes a regexp for parsing URIs. For evil edge cases, I
usually cut and paste directly into re.
Would it be an idea just to incorporate it rather than
hammer the kinks out of the ad-hoc parser? If so, I'll hack
on it.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2002-02-13 10:45
Message:
Logged In: YES
user_id=6656
Sorry, don't know *anything* about URLs and don't really
have the time to learn now...
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=516299&group_id=5470