[Web-SIG] [stdlib-sig] Choosing one of two options for url* in the stdlib reorg

Brett Cannon brett at python.org
Sun Mar 2 21:11:52 CET 2008


On Sun, Mar 2, 2008 at 5:48 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-03-01 21:13, Brett Cannon wrote:
>  > On Sat, Mar 1, 2008 at 4:34 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>  >> On 2008-03-01 05:06, Brett Cannon wrote:
>  >>  > Seriously, I just don't want to support two different approaches to
>  >>  > the same problem.
>  >>
>  >>  Then what makes you believe that the urllib2 approach is the
>  >>  better one ?
>  >>
>  >>  Why not move urllib2 to PyPI and keep urllib ?
>  >>
>  >
>  > Well, I have personal experience where urllib2 was much easier to use
>  > for some custom fetching than urllib.
>  >
>  > But I get your point. If it comes down to preference then your
>  > argument is to choose the one the is used more widely.
>
>  Right.
>
>  I also believe that having a choice is more useful than trying
>  to invent the One Right Way. This may exist for simple problems,
>  but as soon as things get more complicated limiting yourself to
>  just one path on the search tree is bound to cause problems.
>
>
>  >>  >>  It's not really an argument for dropping the more used module in
>  >>  >>  favor of a different module without any real benefit.
>  >>  >
>  >>  > Benefit to old users, no. Benefit to the developers, definitely.
>  >>  > Benefit to new users, yes as there will be less to deal with.
>  >>
>  >>  Same question as above.
>  >>
>  >>
>  >>  >>  You have to ask yourself whether
>  >>  >>  it's ok to ask the maintainers of those ~1000 code modules
>  >>  >>  using urllib for subclassing from the two main classes
>  >>  >>  URLopener and FancyURLopener to download an external dependency
>  >>  >>  from PyPI or ship the module with their code.
>  >>  >
>  >>  > Well, I obviously think it is.
>  >>
>  >>  Please explain. I have yet to see a single comment explaining why
>  >>  urllib2 would be the better choice - if there's really a need to
>  >>  decide (which I don't think there really is).
>  >>
>  >>  If you can put up some sound arguments for why urllib2 is better
>  >>  than urllib, we could move the discussion forward. If not, then
>  >>  I don't really see any benefit in having the discussion at all.
>  >
>  > Well, look at the docs for urllib. There is a list of restrictions
>  > (e.g., does not support the use of proxies which require
>  > authentication). From what I can tell, those items on the list that
>  > are an actual restriction do not carry over to urllib2.
>
>  I'm not sure I follow you: urllib *does* support proxies that
>  require authentication (see the .open_http() method).
>

According to http://docs.python.org/dev/library/urllib.html#module-urllib:

"This module does not support the use of proxies which require authentication".

>
>  > Another thing,
>  > how do you add a custom line to the header for the request in urllib
>  > (e.g., Referer)? The docs for URLOpener don't seem to provide a way.
>  > urllib2, on the other hand, has a very specific way to add headers.
>
>  That's easy:
>
>  class URLReader(urllib.URLopener):
>
>     # Crawler name
>     agentname = 'mxHTMLTools-Crawler'
>
>     def __init__(*args):
>
>         """ Add a user-agent header to the HTTP requests.
>         """
>         self = args[0]
>         apply(urllib.URLopener.__init__, args)
>         # Override the default settings for self.addheaders:
>         assert len(self.addheaders) == 1
>         self.addheaders = [
>             ('user-agent', '%s/%s' % (self.agentname, HTMLTools.__version__)),
>             ]
>     ...
>

But none of that is documented. So if the classes do stay then they
really need to have their documentation flushed out (along with making
sure they have the proper unit tests for those exposed APIs, of
course).

>
>  > But as I said in my last email, I am happy to include URLOpener if
>  > some other people are willing to back the idea up.
>
>  Fair enough.

I will send a separate email to the SIG since people have probably
stopped following most of this thread. =)

-Brett


More information about the Web-SIG mailing list