[Patches] [ python-Patches-852995 ] tests and processors patch for
urllib2
SourceForge.net
noreply at sourceforge.net
Sun Jan 4 18:51:55 EST 2004
Patches item #852995, was opened at 2003-12-03 00:53
Message generated for change (Comment added) made by jjlee
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=852995&group_id=5470
Category: Library (Lib)
Group: Python 2.4
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Jeremy Hylton (jhylton)
Summary: tests and processors patch for urllib2
Initial Comment:
Here are some unit tests for urllib2 and a revised version
of my urllib2 "processors" patch (originally posted as
RFE 759792 -- I'm posting it here since it is a patch, not
just a wish). The tests depend on the patch, but test
more than just the changes introduced by the patch.
A fuller discussion is in the original RFE tracker item, but
briefly: the patch makes it possible to implement
functionality like HTTP cookie handling, Refresh
handling, etc. etc. using handler objects. At the moment
urllib2's handler objects aren't quite up to the job, which
results in a lot of cut-n-paste and subclassing. I believe
the changes are backwards-compatible, with the
exception of people who've reimplemented
build_opener()'s functionality -- those people would need
to call opener.add_handler(HTTPErrorProcessor).
The main change is allowing handlers to implement
methods like:
http_request(request)
http_response(request, response)
In addition to the usual
http_open(request)
http_error{_*}(...)
I call handlers that implement these methods
"processors". These methods get called for *every*
processor (in contrast to the ordinary handler methods,
where the OpenerDirector stops calling the methods as
soon as the first handler handles the request by returning
a response) to pre-process requests and post-process
responses.
If this is accepted, I can submit patches for handlers
(processors) that do HTTP Refresh redirection, cookie
handling etc.
Changes in the patch:
-OpenerDirector changes to call new <protocol>_request
and <protocol>_response methods. I haven't put all the
documentation for this interface in this set of patches
because there's no obvious place for it: handlers aren't
really documented either. The urllib2 docs need a
cleanup, but I'll do that in a separate patch.
-Added .unredirected_hdrs dict to Request, together
with .add_unredirected_headers() and .has_header()
methods. These headers don't get copied to redirected
requests. I didn't add this as a feature for people calling
urlopen on a Request. Rather, the motivation comes from
the fact that processors need to explicitly add headers to
Requests (Cookie, Referer, Content-Length, etc.), rather
than directly sending them over the wire. The problem is,
if they add them to the regular .headers attribute of
requests, processors will end up clobbering headers
added by the user who called urlopen (which would
break backwards-compatibility). Having processors use
a separate set of headers that never get redirected
makes this problem go away: users can add headers
(with either .add_header() or .add_unredirected_header(),
since processors don't clobber either) and know that they
won't get clobbered by any handler.
-HTTPErrorProcessor is necessary to allow response
processors to see responses before redirections &c
happen, by moving the call to parent.error() out of
AbstractHTTPHandler.do_open(). It has the side-effect of
stopping people grumbling that 200 is not the only
success code in HTTP <0.5 wink>, since it makes it
feasible to override urllib2's behaviour of raising an
exception unless the HTTP code == 200.
-Split part of AbstractHTTPHandler.do_open (which
implements http_open / https_open in the
HTTP/HTTPSHandler subclasses) into a new .do_request
(which implements http_request in the subclasses). Just
because I could, really (with the new *_request methods).
It seems clearer to me.
-Single string-formatting-character change to
OpenerDirector.error() to allow "refresh" as an error
code.
-Added .code and .msg attributes to HTTP response
objects, so that processors can know what the response
code and message are. I haven't documented these,
because they're HTTP-specific.
-Renamed HTTPRedirectHandler.error_302_dict
--> .redirect_dict
-Finally, there's one bugfix to HTTPRedirectHandler
included in the patch, because the tests test for it:
multiple visits to a single URL with different redirect codes
is no longer erroneously detected as a loop.
http://a.com/a --> 302 --> http://a.com/b --> Refresh -->
http://a.com/a
Yes, I have seen a site where this really happens!
There are a few other bugs that turned up while writing
the tests, and those tests are commented out ATM. I'll
file bug reports for those separately after this one is
sorted out.
----------------------------------------------------------------------
>Comment By: John J Lee (jjlee)
Date: 2004-01-04 23:51
Message:
Logged In: YES
user_id=261020
Fixed tests in patch 870606.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-01-01 04:00
Message:
Logged In: YES
user_id=80475
FWIW, the test for urllib2 is currently failing and needs to be
fixed.
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2003-12-14 16:11
Message:
Logged In: YES
user_id=261020
Reopening to add a note to Misc/NEWS about the
backwards-incompatibility for people who don't use
build_opener(). I presume that's the right place for this note?
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2003-12-14 15:17
Message:
Logged In: YES
user_id=261020
Just a note that my justification of
Request.add_unredirected_hdrs above makes no sense. :-/ I
mis-remembered my own reasons for adding this. My original
comment from 759792 is correct:
Some headers (Content-Length, Referer, ...) mustn't be copied
to requests for a redirected URL. This requires the addition of a
new dict to Request. I've added an add_unredirected_headers
method, too. The old implementation just sends these headers
directly, but that's not possible if you want to use procesors to
implement these things.
----------------------------------------------------------------------
Comment By: Jeremy Hylton (jhylton)
Date: 2003-12-14 05:28
Message:
Logged In: YES
user_id=31392
Committed as rev 1.57 of urllib2.py
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2003-12-08 00:04
Message:
Logged In: YES
user_id=261020
Oops, I left a couple of lines in urllib2.py that add an HTTP
debugging method and constructor arg. Please ignore that
part of the patch.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=852995&group_id=5470
More information about the Patches
mailing list