[Patches] [ python-Patches-852995 ] tests and processors patch for urllib2

SourceForge.net noreply at sourceforge.net
Tue Dec 2 19:53:50 EST 2003

Patches item #852995, was opened at 2003-12-03 00:53
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 

Category: Library (Lib)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: tests and processors patch for urllib2

Initial Comment:
Here are some unit tests for urllib2 and a revised version 
of my urllib2 "processors" patch (originally posted as 
RFE 759792 -- I'm posting it here since it is a patch, not 
just a wish).  The tests depend on the patch, but test 
more than just the changes introduced by the patch. 
A fuller discussion is in the original RFE tracker item, but 
briefly: the patch makes it possible to implement 
functionality like HTTP cookie handling, Refresh 
handling, etc. etc. using handler objects.  At the moment 
urllib2's handler objects aren't quite up to the job, which 
results in a lot of cut-n-paste and subclassing.  I believe 
the changes are backwards-compatible, with the 
exception of people who've reimplemented 
build_opener()'s functionality -- those people would need 
to call opener.add_handler(HTTPErrorProcessor). 
The main change is allowing handlers to implement 
methods like: 
 http_response(request, response) 
In addition to the usual 
I call handlers that implement these methods 
"processors".  These methods get called for *every* 
processor (in contrast to the ordinary handler methods, 
where the OpenerDirector stops calling the methods as 
soon as the first handler handles the request by returning 
a response) to pre-process requests and post-process 
If this is accepted, I can submit patches for handlers 
(processors) that do HTTP Refresh redirection, cookie 
handling etc. 
Changes in the patch: 
-OpenerDirector changes to call new <protocol>_request 
and <protocol>_response methods.  I haven't put all the 
documentation for this interface in this set of patches 
because there's no obvious place for it: handlers aren't 
really documented either.  The urllib2 docs need a 
cleanup, but I'll do that in a separate patch. 
-Added .unredirected_hdrs dict to Request, together 
with .add_unredirected_headers() and .has_header() 
methods.  These headers don't get copied to redirected 
requests.  I didn't add this as a feature for people calling 
urlopen on a Request.  Rather, the motivation comes from 
the fact that processors need to explicitly add headers to 
Requests (Cookie, Referer, Content-Length, etc.), rather 
than directly sending them over the wire.  The problem is, 
if they add them to the regular .headers attribute of 
requests, processors will end up clobbering headers 
added by the user who called urlopen (which would 
break backwards-compatibility).  Having processors use 
a separate set of headers that never get redirected 
makes this problem go away: users can add headers 
(with either .add_header() or .add_unredirected_header(), 
since processors don't clobber either) and know that they 
won't get clobbered by any handler. 
-HTTPErrorProcessor is necessary to allow response 
processors to see responses before redirections &c 
happen, by moving the call to parent.error() out of 
AbstractHTTPHandler.do_open().  It has the side-effect of 
stopping people grumbling that 200 is not the only 
success code in HTTP <0.5 wink>, since it makes it 
feasible to override urllib2's behaviour of raising an 
exception unless the HTTP code == 200. 
-Split part of AbstractHTTPHandler.do_open (which 
implements http_open / https_open in the 
HTTP/HTTPSHandler subclasses) into a new .do_request 
(which implements http_request in the subclasses).  Just 
because I could, really (with the new *_request methods).  
It seems clearer to me. 
-Single string-formatting-character change to 
OpenerDirector.error() to allow "refresh" as an error 
-Added .code and .msg attributes to HTTP response 
objects, so that processors can know what the response 
code and message are.  I haven't documented these, 
because they're HTTP-specific. 
-Renamed HTTPRedirectHandler.error_302_dict 
--> .redirect_dict 
-Finally, there's one bugfix to HTTPRedirectHandler 
included in the patch, because the tests test for it: 
multiple visits to a single URL with different redirect codes 
is no longer erroneously detected as a loop. 
 http://a.com/a --> 302 --> http://a.com/b --> Refresh --> 
 Yes, I have seen a site where this really happens! 
There are a few other bugs that turned up while writing 
the tests, and those tests are commented out ATM.  I'll 
file bug reports for those separately after this one is 
sorted out. 


You can respond by visiting: 

More information about the Patches mailing list