[Python-bugs-list] [ python-Bugs-535285 ] urllib, fragment identifiers and 404s

Wed, 27 Mar 2002 01:05:46 -0800

Bugs item #535285, was opened at 2002-03-26 17:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=535285&group_id=5470

Category: Python Library
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Tadgh O'Leary (tadgher)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib, fragment identifiers and 404s

Initial Comment:
URLOpener raises a 404 IOError accessing a non-existent fragment 
identifier on certain web servers (in fact, all that I've tested, 
except Apache).

I couldn't find any user-agent guidelines, but 
every user-agent I've tested returns the document with a 200 
response code (including lynx).

To repeat:
Python 2.2 (#1, 
Jan 18 2002, 09:22:45) 
[GCC 2.95.3 20010315 (release) 
[FreeBSD]] on freebsd4
Type "help", "copyright", "credits" or 
"license" for more information.
>>> import urllib
>>> 
urllib.URLopener().open('http://www.apache.org/#fake')
<addinfourl 
at 136169196 whose fp = <open file '<socket>', mode 'rb' at 
0x81e7480>>
>>> 
urllib.URLopener().open('http://www.microsoft.com/#fake')
Traceback 
(most recent call last):
(snipped)
>>> 
urllib.URLopener().open('http://www.sun.com/#fake')
Traceback 
(most recent call last):
(snipped)
>>> 
urllib.URLopener().open('http://www.zeus.com/#fake')
Traceback 
(most recent call 
last):
(snipped)
urllib.URLopener().open('http://www.lotus.com/#fake')
Traceback 
(most recent call last):
(snipped)

----------------------------------------------------------------------

>Comment By: Tadgh O'Leary (tadgher)
Date: 2002-03-27 09:05

Message:
Logged In: YES 
user_id=497284

>the way it works for all web browsers is that the URL >without the fragment 
identifier is used to retrieve the >document
Fair enough. I would have 
seen urllib in the role of "web browser" in this situation, though

>In 
other words, you are not supposed to send it to the >server.

The 
problem here is that URLopener is sending the fragment identifier. If the 
fix is that the user should *always* remove it, then why not let the module 
do the work?

I would have thought most users would expect urllib to 
behave as other user-agents do.

----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2002-03-26 20:35

Message:
Logged In: YES 
user_id=43607

The fragment identifier is for "local consumption" only: the way it works for all web browsers is that the URL without 
the fragment identifier is used to retrieve the document, and the identifier is then used to position the browser to the 
correct position in the document.  In other words, you are not supposed to send it to the server.

See section 4.1 of RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) which is the current specification.

I'd say, this is "Not a bug", and so I close this bug report.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=535285&group_id=5470