[Python-bugs-list] [ python-Bugs-535285 ] urllib, fragment identifiers and 404s
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 27 Mar 2002 01:05:46 -0800
Bugs item #535285, was opened at 2002-03-26 17:00
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=535285&group_id=5470
Category: Python Library
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Tadgh O'Leary (tadgher)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib, fragment identifiers and 404s
Initial Comment:
URLOpener raises a 404 IOError accessing a non-existent fragment
identifier on certain web servers (in fact, all that I've tested,
except Apache).
I couldn't find any user-agent guidelines, but
every user-agent I've tested returns the document with a 200
response code (including lynx).
To repeat:
Python 2.2 (#1,
Jan 18 2002, 09:22:45)
[GCC 2.95.3 20010315 (release)
[FreeBSD]] on freebsd4
Type "help", "copyright", "credits" or
"license" for more information.
>>> import urllib
>>>
urllib.URLopener().open('http://www.apache.org/#fake')
<addinfourl
at 136169196 whose fp = <open file '<socket>', mode 'rb' at
0x81e7480>>
>>>
urllib.URLopener().open('http://www.microsoft.com/#fake')
Traceback
(most recent call last):
(snipped)
>>>
urllib.URLopener().open('http://www.sun.com/#fake')
Traceback
(most recent call last):
(snipped)
>>>
urllib.URLopener().open('http://www.zeus.com/#fake')
Traceback
(most recent call
last):
(snipped)
urllib.URLopener().open('http://www.lotus.com/#fake')
Traceback
(most recent call last):
(snipped)
----------------------------------------------------------------------
>Comment By: Tadgh O'Leary (tadgher)
Date: 2002-03-27 09:05
Message:
Logged In: YES
user_id=497284
>the way it works for all web browsers is that the URL >without the fragment
identifier is used to retrieve the >document
Fair enough. I would have
seen urllib in the role of "web browser" in this situation, though
>In
other words, you are not supposed to send it to the >server.
The
problem here is that URLopener is sending the fragment identifier. If the
fix is that the user should *always* remove it, then why not let the module
do the work?
I would have thought most users would expect urllib to
behave as other user-agents do.
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2002-03-26 20:35
Message:
Logged In: YES
user_id=43607
The fragment identifier is for "local consumption" only: the way it works for all web browsers is that the URL without
the fragment identifier is used to retrieve the document, and the identifier is then used to position the browser to the
correct position in the document. In other words, you are not supposed to send it to the server.
See section 4.1 of RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) which is the current specification.
I'd say, this is "Not a bug", and so I close this bug report.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=535285&group_id=5470