[Twisted-Python] Twisted HTTP client supporting failover for multiple A records?
Hi all, Is there any existing support for any Twisted HTTP client to simulate the behaviour of all modern browsers in that -- if an address returns multiple A records -- and if one IP fails (connection refused, etc) then the client attempts a number of the other IPs before giving up? If not, where should I start? I understand that client.Agent is more modern than client.getPage. Thanks for an awesome framework! -- Best Regards, Luke Marsden Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS Mobile: +447791750420
Hi Luke,
Is there any existing support for any Twisted HTTP client to simulate the behaviour of all modern browsers in that -- if an address returns multiple A records -- and if one IP fails (connection refused, etc) then the client attempts a number of the other IPs before giving up?
As for connecting to hosts that resolve to multiple A records - I presume as a means of load balancing via DNS round robin - I'm not quite sure this is natively supported in Twisted. I believe since all TCP connections are mediated via connectTCP hostnames are ultimately resolved via socket.gethostbyname. I think you really want the support provided by socket.gethostbyname_ex (http://docs.python.org/library/socket.html#socket.gethostbyname_ex). It's a good question though. I'm sure a core dev will come along and give a proper answer soon ;-) Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun
On Thu, 2010-07-15 at 10:46 +0100, Reza Lotun wrote:
I believe since all TCP connections are mediated via connectTCP hostnames are ultimately resolved via socket.gethostbyname.
Twisted uses a thread pool to do DNS lookups by default, so this shouldn't block anything.
On Thu, 2010-07-15 at 10:46 +0100, Reza Lotun wrote:
As for connecting to hosts that resolve to multiple A records - I presume as a means of load balancing via DNS round robin - I'm not quite sure this is natively supported in Twisted. I believe since all TCP connections are mediated via connectTCP hostnames are ultimately resolved via socket.gethostbyname. I think you really want the support provided by socket.gethostbyname_ex (http://docs.python.org/library/socket.html#socket.gethostbyname_ex).
It's a good question though. I'm sure a core dev will come along and give a proper answer soon ;-)
Gar. I should read better. Twisted uses a threadpool of gethostname by default, but you can plug in your own resolver (e.g. you can use twisted.names): http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.interfaces.IR... The question is whether the client code re-resolves on each re-connect, and whether the current lookup interface is sufficient for this use case. Alas, I'm pretty sure the answer is no. You could however always just do the DNS lookup yourself, passing resulting correct IP to connectTCP, just make sure you don't block (e.g. by using deferToThread to call gethostbyname_ex).
On Thu, 2010-07-15 at 08:06 -0400, Itamar Turner-Trauring wrote:
On Thu, 2010-07-15 at 10:46 +0100, Reza Lotun wrote:
As for connecting to hosts that resolve to multiple A records - I presume as a means of load balancing via DNS round robin
We're actually using it to provide redundancy in this instance. In our application any request for any site can be made to any (live) server, so having dead servers in the pool of A records doesn't matter so long as real web browsers failover to some other A record within a second, which they do! http://crypto.stanford.edu/dns/dns-rebinding.pdf The problem is that my test application uses client.getPage which, because it uses the reactor's standard DNS lookup mechanism, picks just one A record and sticks to it. So, it reports connection errors (some fraction of the time, as A records are randomised) even when the user of a "real" web browser would not experience them. These errors go away when the dead server(s) drop out of the DNS pool and reactor's lookups stops returning the dead IP, but this takes some time.
Gar. I should read better. Twisted uses a threadpool of gethostname by default, but you can plug in your own resolver (e.g. you can use twisted.names):
http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.interfaces.IR...
The question is whether the client code re-resolves on each re-connect, and whether the current lookup interface is sufficient for this use case.
Alas, I'm pretty sure the answer is no.
You could however always just do the DNS lookup yourself, passing resulting correct IP to connectTCP, just make sure you don't block (e.g. by using deferToThread to call gethostbyname_ex).
Thanks Itamar, this is massively useful. I'll try subclassing twisted.web.client.Agent to do its own DNS lookups with twisted.names so as to be aware of the full list of A records returned. It would then attempt all the IP addresses in turn until it finds one which works, giving up only if all the IPs yield connection errors. This should mirror the behaviour of the majority of web browsers "in the wild". Would you be interested in having this code contributed back to Twisted if I can get it working? It might be a useful addition to the Agent. -- Best Regards, Luke Marsden Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS Mobile: +447791750420
On Thu, 2010-07-15 at 13:33 +0100, Luke Marsden wrote:
Thanks Itamar, this is massively useful. I'll try subclassing twisted.web.client.Agent to do its own DNS lookups with twisted.names so as to be aware of the full list of A records returned. It would then attempt all the IP addresses in turn until it finds one which works, giving up only if all the IPs yield connection errors. This should mirror the behaviour of the majority of web browsers "in the wild".
I suspect you can do this without subclassing... pass in IP address, and just make sure you pass correct Host header. I forget the exact API though.
I suspect you can do this without subclassing... pass in IP address, and just make sure you pass correct Host header. I forget the exact API though.
Yeah, I was about to say, why not just call socket.gethostbyname_ex in deferToThread and in the callback do a regular Agent.request? Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun
On Thu, 2010-07-15 at 14:28 +0100, Reza Lotun wrote:
I suspect you can do this without subclassing... pass in IP address, and just make sure you pass correct Host header. I forget the exact API though.
This makes sense. Conceptually I had considered it to be the responsibility of the web client itself to handle the reconnection, not the calling code, hence my plan for a subclass. But a separate class which uses the Agent's API makes a lot more sense, and it can equally provide the same interface as Agent so that any code which uses Agent can use it without modifications.
Yeah, I was about to say, why not just call socket.gethostbyname_ex in deferToThread and in the callback do a regular Agent.request?
Sounds like a plan! Thanks guys. -- Best Regards, Luke Marsden Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS Mobile: +447791750420
On 01:54 pm, luke-lists@hybrid-logic.co.uk wrote:
On Thu, 2010-07-15 at 14:28 +0100, Reza Lotun wrote:
I suspect you can do this without subclassing... pass in IP address, and just make sure you pass correct Host header. I forget the exact API though.
[snip] But a separate class which uses the Agent's API makes a lot more sense, and it can equally provide the same interface as Agent so that any code which uses Agent can use it without modifications.
Hooray. This is exactly the intended approach to extending Agent. I'm glad you figured it out on your own. :) Jean-Paul
Luke Marsden
We're actually using it to provide redundancy in this instance. In our application any request for any site can be made to any (live) server, so having dead servers in the pool of A records doesn't matter so long as real web browsers failover to some other A record within a second, which they do! http://crypto.stanford.edu/dns/dns-rebinding.pdf
Be aware that the time to failover to an alternate A record need not be that fast depending on the sort of failure involved. Failover can only occur quickly as long as the outage (network unreachable, port no longer active on the host, etc..) is such that the connection attempt is explicitly rejected by the target host or a router along the way. If it's a more complicated outage (e.g., a routing loop or total machine failure) for which no explicit failure response will be received by the client, you'll be subject to the client's connect timeout (one per each failing address and attempt to that address it tries). These may vary by client and/or platform, but can easily be 30-60s - certainly long enough for the human involved to potentially want to give up. Also, since web browsers typically cache DNS responses, if a bad address is early in the list, a timeout will be encountered for each and every individual browser request generated. I did a quick test with a stock FireFox 3.6 under OSX and with a bad initial A record (non-existent host) it took about 75s to failover to the next A record. In my test case even that was unusable since the host I was referencing had other references to itself needed to load that home page, and each of those references themselves took another 75 seconds to time out. So it took more than 2 minutes for me to see the page I wanted, which I presume most people would give up on. That's not to say using multiple A records isn't a helpful practice for many sorts of outages (especially to permit controlled maintenance). Just don't expect it to necessarily be sufficient in all failure modes depending on the behavior you want clients to experience. If this is strictly limited to a client you control, it's much less of an issue, since you can drop the TCP connect timeout much lower than what it defaults to, though you still probably can't match how fast it can happen for rejected connections, since you'll want to leave enough room for occasional latency or response time issues without immediately failing over. But you can do a lot better than the system defaults. -- David
Hi David,
That's not to say using multiple A records isn't a helpful practice for many sorts of outages (especially to permit controlled maintenance). Just don't expect it to necessarily be sufficient in all failure modes depending on the behavior you want clients to experience.
Indeed, in our application it's considered an optimisation over DNS failover. This is why we also use a low TTL (30 seconds) to purge the bad A records out of the pool as soon as possible.
If this is strictly limited to a client you control, it's much less of an issue, since you can drop the TCP connect timeout much lower than what it defaults to, though you still probably can't match how fast it can happen for rejected connections, since you'll want to leave enough room for occasional latency or response time issues without immediately failing over. But you can do a lot better than the system defaults.
Unfortunately we have no control over the clients' configuration (this is a LAMP web hosting environment). But 30 seconds is considered much more acceptable than the days it can often take a manual repair job if a server goes down. -- Best Regards, Luke Marsden Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS Mobile: +447791750420
participants (5)
-
David Bolen
-
exarkun@twistedmatrix.com
-
Itamar Turner-Trauring
-
Luke Marsden
-
Reza Lotun