urllib and proxy

Andy Gimblett gimbo at ftech.net
Tue Feb 26 05:01:36 EST 2002


On Mon, Feb 25, 2002 at 11:03:07PM +0100, Marek Augustyn wrote:

> Is is possible to disable proxy for some urllib.urlopen() call?

Not directly but hey, this is open source so we can make it happen.
:-)

I've attached my_urllib.py which contains a version of urlopen() with
the following behaviour:

    - It takes an optional proxies parameter.

    - Its default behaviour is to not use any proxies.

    - You can explicitly tell it which proxies to use.

    - You can explicitly tell it to do what urllib.urlopen() does, ie
      get the proxy settings from the environment, registry, or
      wherever.

Let's have a look at how I worked out how to do this (which I only did
in the last 20 mins in order to answer your question).  Basically I
looked at urllib.py and saw how that handled proxies, and went from
there...

urllib.urlopen() just creates and utilises a urllib.FancyURLopener
instance.  Note that it keeps the FancyURLopener instance cached in a
global variable for efficiency, but that my_urllib.urlopen() doesn't
do this, for simplicity.

_urlopener = None
def urlopen(url, data=None):
    """urlopen(url [, data]) -> open file-like object"""
    global _urlopener
    if not _urlopener:
        _urlopener = FancyURLopener()
    if data is None:
        return _urlopener.open(url)
    else:
        return _urlopener.open(url, data)

Looking at the constructors of FancyURLopener and URLopener (its
superclass), we have:

class FancyURLopener(URLopener):

    def __init__(self, *args):
        apply(URLopener.__init__, (self,) + args)
        self.auth_cache = {}
        self.tries = 0
        self.maxtries = 10

class URLopener:

    def __init__(self, proxies=None, **x509):
        if proxies is None:
            proxies = getproxies()
        assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
        self.proxies = proxies

	# ... other stuff not important here

So, you can tell URLopener (and hence FancyURLopener) which proxies to
use, but if you don't pass it any, it calls urllib.getproxies() which
gets the proxy settings from the environment, or registry, or whatever
(depending on platform).

So what form do these proxy settings take?  Well, as the assert says,
they have to be a map, and if we look at urllib.getproxies() we see
that in fact it's "a dictionary of scheme -> proxy server URL
mappings."

On my box, this looks like this:

h[1] >>> import urllib
h[1] >>> urllib.getproxies()
{'http': 'http://127.0.0.1:8080/'}

So, to tell FancyURLopener (or URLopener) _not_ to use any proxies,
and also to _not_ get the "system" proxies from the environment or
wherever, we simply pass it an empty dictionary as the proxies
parameter.

Which is what my_urllib.urlopen() is capable of.

I haven't _extensively_ tested the code, but it looks right to me,
assuming I haven't made a mistake in the reasoning outlined above.
All comments/feedback appreciated.

Hope this helps,

Andy

-- 
Andy Gimblett - Programmer - Frontier Internet Services Limited
Tel: 029 20 820 044 Fax: 029 20 820 035 http://www.frontier.net.uk/
Statements made are at all times subject to Frontier's Terms and
Conditions of Business, which are available upon request.
-------------- next part --------------
#!/usr/bin/env python
#
# my_urllib.py
#
# urlopen() with optional proxying.
#
# Changelog:
#   2002.02.26.0945 AMG     v1.0
#                           Created.

"""my_urllib.py -- urlopen() with optional proxying."""

__version__ = "1.0"
__author__ = "Andy Gimblett"

import urllib

def urlopen(url, proxies={}, data=None):

    """Adaption of urllib.urlopen() with optional proxying.

    Adaption of urllib.urlopen() with optional proxying.  Parameters
    and behaviour are as urllib.urlopen() except for the optional
    proxies parameter.

    proxies
    
        By default this is an empty dictionary, which results in no
        proxy being used.

        Alternatively could be a dictionary of scheme -> proxy server
        URL mappings, eg {'http': 'http://127.0.0.1:8080/}
    
        Alternatively could be None, in which case urllib.getproxies()
        is called by URLopener.__init__() to obtain proxy settings
        from the environment/registry/whatever.

    """

    # nb: Unlike urllib.urlopen(), we create a new FancyURLopener
    # instance for each call.  This is less efficient but easier to
    # grok.  Adapt as you see fit.
    
    urlopener = urllib.FancyURLopener(proxies)
    if data is None:
        return urlopener.open(url)
    else:
        return urlopener.open(url, data)


More information about the Python-list mailing list