[ python-Bugs-626543 ] urllib2 doesn't do HTTP-EQUIV & Refresh
SourceForge.net
noreply at sourceforge.net
Wed Feb 1 21:31:13 CET 2006
Bugs item #626543, was opened at 2002-10-21 21:57
Message generated for change (Settings changed) made by jjlee
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=626543&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
>Status: Closed
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2 doesn't do HTTP-EQUIV & Refresh
Initial Comment:
I just added support for HTML's META HTTP-EQUIV and
zero-time Refresh HTTP headers to my 'ClientCookie'
package (which exports essentially a clone of the
urllib2 interface that knows about cookies, making use
of urllib2 in the implementation). I didn't make a
patch for urllib2 itself but it would be easy to do so.
I don't plan to do this immediately, but will
eventually (assuming Jeremy thinks it's advisible) -- I
just wanted to register this fact to prevent
duplication of effort.
[BTW, this version of ClientCookie isn't on my web page
yet -- my motherboard just died.]
I'm sure you know this already, but: HTTP-EQUIV is just
a way of putting headers in the HEAD section of an HTML
document; Refresh is a Netscape 1.1 header that
indicates that a browser should redirect after a
specified time. Refresh headers with zero time act
like redirections.
The net result of the code I just wrote is that if you
urlopen a URL that points to an HTML document like
this:
<HTML><HEAD>
<META HTTP-EQUIV="Refresh" CONTENT="0;
URL=http://acme.com/new_url.htm">
</HEAD></HTML>
you're automatically redirected to
"http://acme.com/new_url.htm". Same thing happens if
the Refresh is in the HTTP headers, because all the
HTTP-EQUIV headers are treated like real HTTP headers.
Refresh with non-zero delay time is ignored (the
urlopen returns the document body unchanged and does
not redirect, but does still add the Refresh header to
the HTTP headers).
A few issues:
0) AFAIK, the Refresh header is not specified in any
RFC, but only here:
http://wp.netscape.com/assist/net_sites/pushpull.html
(HTTP-EQUIV seems to be in the HTML 4.0 standard, maybe
earlier ones too)
1) Infinite loops should be detected, as for HTTP 30x?
Presumably yes.
2) Should add HTTP-EQUIV headers to response object, or
just treat them like headers internally? Perhaps it
should be possible to get both behaviours?
3) Bug in my implementation: is greedy with reading
body data from httplib's file object.
John
----------------------------------------------------------------------
>Comment By: John J Lee (jjlee)
Date: 2006-02-01 20:31
Message:
Logged In: YES
user_id=261020
Closing since I no longer intend to contribute this.
(I don't want to get involved with HTML parsing in the stdlib!)
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2003-10-29 23:27
Message:
Logged In: YES
user_id=261020
Just an update:
- this could now be implemented as a handler (and already is,
in my ClientCookie package) using RFE 759792, rather than
having to be mixed in with HTTPHandler
- the issues I listed in my initial comment, and the
backwards-compatibility issue raised by MvL are now
resolved
- it needs reimplementing using HTMLParser (currently uses
htmllib) if it's to go in the standard library; I plan to do this in
time for 2.4
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-10-26 14:30
Message:
Logged In: YES
user_id=21627
I would try to subclass HTTPHandler, and then provide a
build_opener wrapper that installs this handler instead of
the normal http handler (the latter is optional, since the
user could just do build_opener(HTTPRefreshHandler)).
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2002-10-24 00:20
Message:
Logged In: YES
user_id=261020
What do you think the solution to the backwards-
compatibility problem is? Leave urllib2 as-is? Add a
switch to turn it on? Something else?
At the moment, I just deal with it in AbstractHTTPHandler.
It would be nice to treat it like the other redirections, by
writing a RefreshHandler -- this would solve the backwards-
compatibility issue. However, OpenerDirector.error always
calls http_error_xxx ATM (where xxx is the HTTP error code),
so without changing that, I don't think a RefreshHandler is
really possible. I suppose the sensible solution is just to
make a new HTTPHandler and HTTPSHandler?
Can you think of any way in which supporting HTTP-EQUIV
would mess up backwards compatibility, assuming the body is
unchanged but the headers do have the HTTP-EQUIV headers
added?
John
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-10-23 14:54
Message:
Logged In: YES
user_id=21627
In addition to the issues you have mentioned, there is also
the backwards compatibility issue: Some applications might
expect to get a meta-refresh document from urllib, then parse
it and retry themselves. Those applications would break with
such a change.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=626543&group_id=5470
More information about the Python-bugs-list
mailing list