[Python-Dev] Urllib code or the docs appear wrong

Skip Montanaro skip at pobox.com
Mon Mar 7 21:43:01 CET 2005


It seems to me that either urllib's docs are wrong or its code is wrong
w.r.t. how the User-agent header is handled.  In part, the docs say:

    By default, the URLopener class sends a User-Agent: header of
    "urllib/VVV", where VVV is the urllib version number. Applications can
    define their own User-Agent: header by subclassing URLopener or
    FancyURLopener and setting the instance attribute version to an
    appropriate string value before the open() method is called.

Looking at the code it seems to me that the User-agent header is fixed at
instantiation time:

    version = "Python-urllib/%s" % __version__

    # Constructor
    def __init__(self, proxies=None, **x509):
        ...
        self.addheaders = [('User-agent', self.version)]
        ...

and that when open_http() is called, it simply calls putheader() for each
element of addheaders.  Setting the version instance attribute will have no
effect.  If I managed to add another User-agent header before open_http()
was called, the request would wind up with two copies which is probably not
desirable either.

I can see a couple ways around this:

    * Just change the docs to match the current implementation.  Users
      wishing to override the User-agent header would then have to subclass
      FancyURLopener and set the version class attribute.

    * Defer decisions about the value of the User-agent until open_http() is
      called.

It appears the OpenerDirector class in urllib2 has a similar "early binding"
problem.

I don't particularly care how this is solved, but it appears to need
solving.

Skip


More information about the Python-Dev mailing list