[Twisted-Python] Running a HTTP client connection through a SOCKS proxy
![](https://secure.gravatar.com/avatar/54dc38bc9fc58b7b010a4b780fafedbe.jpg?s=120&d=mm&r=g)
Hey, I'm working on an app at the moment, part of which needs to grab a page from a website and parse the results. It will have to work in environments where connections have to be routed through a SOCKSv4 proxy but I can't find any way to specify a proxy using any of the Twisted classes. Is this possible? I assumed it was seeing as how it is a fairly common situation but the documentation/API haven't provided any useful information. Cheers, nnp Btw, my code currently looks like this : from twisted.web.client import HTTPClientFactory from twisted.internet import ssl from twisted.internet import reactor from twisted.web import client class BugGetter: def __init__(self, url): ''' This class attempts to verify that a bug ID is a legitimate ''' self.url = url self.contextFactory = ssl.ClientContextFactory() self.scheme, self.host, self.port, path = client._parse(url) def getPage(self, bugId): self.bugUrl = ''.join([self.url, bugId]) self.hcf = HTTPClientFactory(self.bugUrl) self.hcf.deferred.addCallback(self.parsePage) self.hcf.deferred.addErrback(self.errorCallback) if scheme == 'https:': reactor.connectSSL(self.host, self.port, self.hcf, self.contextFactory) else: reactor.connectTCP(self.host, self.port, self.hcf) reactor.run() def parsePage(self): print self.hcf.status print self.hcf.message reactor.stop() def errorCallback(self, failure): print failure.getErrorMessage() reactor.stop() bz = Bugzilla('https://bugs.example.org/show_bug.cgi?id=') bz.getPage('9999') -- http://www.smashthestack.org http://www.unprotectedhex.com
![](https://secure.gravatar.com/avatar/54dc38bc9fc58b7b010a4b780fafedbe.jpg?s=120&d=mm&r=g)
I just realised web proxy support would be much preferable to SOCKSv4 (long day, very tired)...so...to alter my question....does twisted have support for proxying connections through a web proxy? On Wed, Jul 23, 2008 at 3:46 PM, nnp <version5@gmail.com> wrote:
-- http://www.smashthestack.org http://www.unprotectedhex.com
![](https://secure.gravatar.com/avatar/152986af8e990c9c8b61115f298b9cb2.jpg?s=120&d=mm&r=g)
On Wed, Jul 23, 2008 at 04:07:20PM +0100, nnp wrote:
I believe Twisted doesn't support HTTP proxies of any kind. There is support for SOCKSv4 in twisted.protocols.socks, but it's not immediately evident to me how it works. It seems to me that in an ideal world, there would be some kind of Proxy class available that would look much like a reactor to the outside world, but contain all the logic for reading proxy settings from the OS and making outgoing connections via those configured proxies. One day, if I get far enough down my list of potential weekend projects, I might even do it myself.
![](https://secure.gravatar.com/avatar/54dc38bc9fc58b7b010a4b780fafedbe.jpg?s=120&d=mm&r=g)
Hrm, OK....I'm not sure if I have an odd view of the world or whether supporting HTTP proxies isn't all that important. Aren't HTTP proxies fairly common in large organisations and universities? Looks like I won't be able to use Twisted for this particular project then. In case anyone is wondering, I solved the problem (of needing support for access to https sites via a proxy) by using the code found here http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195 As for the socks proxy support....I looked at that and a few other modules but (unsurprisingly) found no real documentation and seeing as how I'm new to Twisted I also had no clue how to use the classes. Cheers, nnp On Thu, Jul 24, 2008 at 11:23 AM, Tim Allen <screwtape@froup.com> wrote:
-- http://www.smashthestack.org http://www.unprotectedhex.com
![](https://secure.gravatar.com/avatar/426d6dbf6554a9b3fca1fd04e6b75f38.jpg?s=120&d=mm&r=g)
On Fri, Jul 25, 2008 at 04:05:30PM +1000, Tim Allen wrote:
It's not hard: #!/usr/bin/python from twisted.internet import reactor from twisted.web import client def got(page): print "got", repr(page) reactor.callLater(0.1, reactor.stop) def failed(err): print err.getErrorMessage() reactor.callLater(0.1, reactor.stop) class ProxyClientFactory(client.HTTPClientFactory): def setURL(self, url): # do the normal stuff client.HTTPClientFactory.setURL(self, url) # then re-set the path to be the full url self.path = url cf = ProxyClientFactory('http://www.google.com/') cf.deferred.addCallbacks(got, failed) reactor.connectTCP('wwwcache1.ic.ac.uk', 3128, cf)
![](https://secure.gravatar.com/avatar/d9f7d050f8173fb38495c874a556564f.jpg?s=120&d=mm&r=g)
"nnp" <version5@gmail.com> wrote:
Yeah, but, except for when authentication is required, most of them are set up to be transparent and hence don't really need support. I do realize that there are authenticating http proxies, I have one set up at work, but I guess it's not common enough yet to garner support. -- James Tanis Technical Coordinator Monsignor Donovan Catholic High School e: jtanis@mdchs.org
![](https://secure.gravatar.com/avatar/54dc38bc9fc58b7b010a4b780fafedbe.jpg?s=120&d=mm&r=g)
I just realised web proxy support would be much preferable to SOCKSv4 (long day, very tired)...so...to alter my question....does twisted have support for proxying connections through a web proxy? On Wed, Jul 23, 2008 at 3:46 PM, nnp <version5@gmail.com> wrote:
-- http://www.smashthestack.org http://www.unprotectedhex.com
![](https://secure.gravatar.com/avatar/152986af8e990c9c8b61115f298b9cb2.jpg?s=120&d=mm&r=g)
On Wed, Jul 23, 2008 at 04:07:20PM +0100, nnp wrote:
I believe Twisted doesn't support HTTP proxies of any kind. There is support for SOCKSv4 in twisted.protocols.socks, but it's not immediately evident to me how it works. It seems to me that in an ideal world, there would be some kind of Proxy class available that would look much like a reactor to the outside world, but contain all the logic for reading proxy settings from the OS and making outgoing connections via those configured proxies. One day, if I get far enough down my list of potential weekend projects, I might even do it myself.
![](https://secure.gravatar.com/avatar/54dc38bc9fc58b7b010a4b780fafedbe.jpg?s=120&d=mm&r=g)
Hrm, OK....I'm not sure if I have an odd view of the world or whether supporting HTTP proxies isn't all that important. Aren't HTTP proxies fairly common in large organisations and universities? Looks like I won't be able to use Twisted for this particular project then. In case anyone is wondering, I solved the problem (of needing support for access to https sites via a proxy) by using the code found here http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195 As for the socks proxy support....I looked at that and a few other modules but (unsurprisingly) found no real documentation and seeing as how I'm new to Twisted I also had no clue how to use the classes. Cheers, nnp On Thu, Jul 24, 2008 at 11:23 AM, Tim Allen <screwtape@froup.com> wrote:
-- http://www.smashthestack.org http://www.unprotectedhex.com
![](https://secure.gravatar.com/avatar/426d6dbf6554a9b3fca1fd04e6b75f38.jpg?s=120&d=mm&r=g)
On Fri, Jul 25, 2008 at 04:05:30PM +1000, Tim Allen wrote:
It's not hard: #!/usr/bin/python from twisted.internet import reactor from twisted.web import client def got(page): print "got", repr(page) reactor.callLater(0.1, reactor.stop) def failed(err): print err.getErrorMessage() reactor.callLater(0.1, reactor.stop) class ProxyClientFactory(client.HTTPClientFactory): def setURL(self, url): # do the normal stuff client.HTTPClientFactory.setURL(self, url) # then re-set the path to be the full url self.path = url cf = ProxyClientFactory('http://www.google.com/') cf.deferred.addCallbacks(got, failed) reactor.connectTCP('wwwcache1.ic.ac.uk', 3128, cf)
![](https://secure.gravatar.com/avatar/d9f7d050f8173fb38495c874a556564f.jpg?s=120&d=mm&r=g)
"nnp" <version5@gmail.com> wrote:
Yeah, but, except for when authentication is required, most of them are set up to be transparent and hence don't really need support. I do realize that there are authenticating http proxies, I have one set up at work, but I guess it's not common enough yet to garner support. -- James Tanis Technical Coordinator Monsignor Donovan Catholic High School e: jtanis@mdchs.org
participants (4)
-
James Tanis
-
nnp
-
Phil Mayers
-
Tim Allen