[ python-Bugs-874842 ] httplib fails on Akamai URLs

SourceForge.net noreply at sourceforge.net
Thu May 13 20:29:57 EDT 2004


Bugs item #874842, was opened at 2004-01-11 06:16
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=874842&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Leif Hedstrom (zwoop)
Assigned to: Jeremy Hylton (jhylton)
Summary: httplib fails on Akamai URLs

Initial Comment:
Using Python 2.3.2 and httplib, reading from Akamai
URLs will always hang at the end of the transacation.
As common as this must be, I couldn't find anything
related to it on any search engines, nor on the bug
list here.

The problem is that Akamai returns an HTTP/1.0
response, with a header like:

   Connection: keep-alive


httplib does not recognize this response properly (the
Connection: header parsing is only done for HTTP/1.1
responses). I'm not sure exactly what the right
solution is, but I'm supplying one alternative solution
that does solve the problem. I'm attaching a diff
against httplib.py.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2004-05-13 20:29

Message:
Logged In: YES 
user_id=6380

Looks like nothing happened in CVS... :-(  It's too late for
2.3.4 now, Anthony issued the release candidate already.
There will be a 2.3.5 though.

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-05-13 13:15

Message:
Logged In: YES 
user_id=480913

Hate to beat on a dead horse here, but what was the final
outcome of this discussion? Anything I can do to help
produce a better patch, documentation or anything?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-20 12:53

Message:
Logged In: YES 
user_id=6380

It's great that the ASF can wield such power over the likes
of AOL. But I don't want to presume the same for Python
(we're not the #1 web language, not even #2).

I'd be more concerned if adding this hack would *break*
anything, but that doesn't seem to be the case. So I still
think Jeremy can check it in.

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-20 11:36

Message:
Logged In: YES 
user_id=480913

Yeah, the second solution is what I ended up doing, although
it's definitely not obvious for anyone using httplib.py that
this is required to support Akamai (see my original blog
post at
http://www.ogre.com/tiki-view_blog_post.php?blogId=3&postId=30
for both alternative solutions).

At a minimum, I think we should provide the
AkamaiHTTPResponse class in one way or another, and clearly
document that this is required for correct support of Akamai
URLs.   My vote would probably be to "hack" the original
HTTPResponse class, since anyone using HTTPlib for anything
that might hit Akamai (perhaps as a referral/redirect) will
have to use the fixed version anyways.

Unfortunately I don't have any contacts left at Akamai, so
I'm not sure how to inform them of their problems. I
completely agree that we need to inform them about this
problem, my point was that since Akamai works with pretty
much everything else (browsers, other modules etc.), I think
it'll be quite slow to get them to change. And until then,
we're stuck with a module that is effectively semi-broken.

Thanks,

-- Leif


----------------------------------------------------------------------

Comment By: Greg Stein (gstein)
Date: 2004-04-20 02:44

Message:
Logged In: YES 
user_id=6501

Falling into line with "oh, but they won't change it" is why
we end up with a whole bunch of bad implementations. If
everybody said that, then we wouldn't get anywhere. A long
while back, AOL came out with a busted proxy implementation
which didn't work with Apache servers. The ASF said, "sorry
AOL: you're wrong. fix your proxies." And they did. If we
put a hack in for every busted thing that came out over the
next ten years, then imagine the craphole we'd be in... :-)

That said: yes, you can workaround the issue with a subclass
of HTTPResponse which overrides the _check_close() method.
You can then create an HTTPConnection subclass which
overrides the class variable 'response_class', or you can
set that field in an HTTPConnection instance as an instance
variable. For example:

  conn = HTTPConnection(...)
  conn.response_class = AkamaiBugHandler

When the response arrives, the HTTPConnection class uses
self.response_class, so there are a few options to get your
custom response class into the chain of events.

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-19 20:57

Message:
Logged In: YES 
user_id=480913

As I said, no matter what we do, it's a hack on something
that's broken on the web (now there's a shocker :-). I don't
feel terribly strongly on this issue, I merely filed the bug
report because I had this problem, and it took me several
hours to figure out why my daemon would stall on Akamai
URLs. I'm guessing other users of httplib.py might run into
the same problem.

As for the patch, the comments would of course have to
change, I didn't want to impose more changes in the diff
than necessary.

Besides the suggested patch, an alternative solution is to
provide a specialized implementation of the HTTPResponse
class, which works with Akamai. The users of the httplib.py
module would then have to explicitly request that
httplib.HTTPConnection should instantiate that class instead
of the default one. Preferably this would be passed as a new
argument to the constructor for HTTPConnection.

And I agree that it's a hack to have to code around poor
server implementations. But not sure what our odds are to
get Akamai to fix their servers any time soon, since pretty
much any web browser in existance works with their broken
implementation.

Cheers,

-- leif


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-19 18:32

Message:
Logged In: YES 
user_id=6380

I won't reject the patch on that basis. Like HTML, it's more
useful to be able to handle what we see in the real world
than to stick to the standard. Clearly the OP needs to be
able to access Akamai servers. He doesn't have the power to
fix the Akamai servers,so saying "the server is wrong"
doesn't do him any good. (The comment should stateclearly
that Akamai *is* wrong though!)

Or do you have a different suggestion for how the poster can
work around the problem?

----------------------------------------------------------------------

Comment By: Greg Stein (gstein)
Date: 2004-04-19 18:26

Message:
Logged In: YES 
user_id=6501

I have a philosophical problem with compensating for servers
that obviously break protocols. The server should be fixed,
not *every* client on the planet. From that standpoint, this
problem/fix should be rejected, though I defer to Guido on
that choice.

That said, the comment right above the patch should be
fixed. The whole point of that comment is, "the header
shouldn't be there, so we shouldn't bother to examine the
thing." Obviously, the new code does, so the two comments
should be merged. The comment about Akamai should also be
strengthened to note that it is violating the HTTP protocol
(see section 8.1.2.1 of RFC 2616).

Summary: I'd reject it, but will leave that to Guido to
choose (i.e. "we'll help users even tho it violates
protocols"). If he wants it, then +1 if the comments are
fixed up.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2004-04-15 17:59

Message:
Logged In: YES 
user_id=31392

Looks good to me.  I want to see if I can come up with a
simple test module for httplib with the network resource
enabled.  I'll see if I can do that tonight.


----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-12 16:54

Message:
Logged In: YES 
user_id=480913

Heh, yeah, I'm pretty sure that's the problem, Akamai being
confused about protocols. They claim to be a v1.0 HTTP
proxy, yet they use v1.1 HTTP headers :-/. This is why I
mentioned I wasn't sure exactly what the right solution is.
And no matter what we do, it'll be a hack.  Maybe the
original author of the module has some insight ?

Unfortunately, there's a lot of Akamai content out there
that are affected by this.

Cheers,

-- Leif


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-12 16:32

Message:
Logged In: YES 
user_id=6380

Hmm...  Indeed. read() checks will_close and apparently
setting that to False will do the right thing.

I don't know HTTP and this code well enough to approve this
fix though. Also, the comment right above your patch should
probably be fixed; it claims that connection headers on
HTTP/1.0 are due to confused proxies. (Maybe that's what
Akamai servers are? :-)

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-12 16:13

Message:
Logged In: YES 
user_id=480913

Yeah, that works for me to. But the problem is in the
HTTPResponse class from the httplib.py module. For example,
this code (butchered from my application) will hang on
Akamai URLs:

#!/usr/bin/python

import httplib


def testHTTPlib(host, url):
    http = httplib.HTTPConnection(host)
    try:
        http.request('GET', url)
        response = http.getresponse()
    except IOError:
        self._log.warning("Can't connect to %s", url)
        return False
    except socket.error:
        self._log.error("Socket error retrieving %s", url)
        return False
    except socket.timeout:
        self._log.warning("Timeout connecting to %s", url)
        return False
    else:
        try:
            data = response.read()
            return True
        except socket.timeout:
            self._log.warning("Timeout reading from %s", url)
            return False
    return False

print testHTTPlib("www.ogre.com", "/")
print testHTTPlib("www.akamai.com", "/")


Granted, I think Akamai aren't strictly following the
protocols, but it's inconvenient that this piece of code
stalls here (and only for akamai.com domains, I've tried a
lot of them).

Thanks!

-- Leif


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-12 15:36

Message:
Logged In: YES 
user_id=6380

Can you give a complete program that reproduces this? I've 
tried this:

>>> import urllib
>>> urllib.urlopen("http://www.akamai.com").read()

and it doesn't hang for me. I tried a number of Python 
versions from 2.2 through 2.4a0.

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-01-11 14:37

Message:
Logged In: YES 
user_id=480913

Oh, I forgot, this is easiest reproduced by simple
requesting the URL

   http://www.akamai.com/


Fortunately they Akamai their home page as well. :-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=874842&group_id=5470



More information about the Python-bugs-list mailing list