[Twisted-Python] Running a HTTP client connection through a SOCKS proxy

Hey, I'm working on an app at the moment, part of which needs to grab a page from a website and parse the results. It will have to work in environments where connections have to be routed through a SOCKSv4 proxy but I can't find any way to specify a proxy using any of the Twisted classes. Is this possible? I assumed it was seeing as how it is a fairly common situation but the documentation/API haven't provided any useful information. Cheers, nnp Btw, my code currently looks like this : from twisted.web.client import HTTPClientFactory from twisted.internet import ssl from twisted.internet import reactor from twisted.web import client class BugGetter: def __init__(self, url): ''' This class attempts to verify that a bug ID is a legitimate ''' self.url = url self.contextFactory = ssl.ClientContextFactory() self.scheme, self.host, self.port, path = client._parse(url) def getPage(self, bugId): self.bugUrl = ''.join([self.url, bugId]) self.hcf = HTTPClientFactory(self.bugUrl) self.hcf.deferred.addCallback(self.parsePage) self.hcf.deferred.addErrback(self.errorCallback) if scheme == 'https:': reactor.connectSSL(self.host, self.port, self.hcf, self.contextFactory) else: reactor.connectTCP(self.host, self.port, self.hcf) reactor.run() def parsePage(self): print self.hcf.status print self.hcf.message reactor.stop() def errorCallback(self, failure): print failure.getErrorMessage() reactor.stop() bz = Bugzilla('https://bugs.example.org/show_bug.cgi?id=') bz.getPage('9999') -- http://www.smashthestack.org http://www.unprotectedhex.com

I just realised web proxy support would be much preferable to SOCKSv4 (long day, very tired)...so...to alter my question....does twisted have support for proxying connections through a web proxy? On Wed, Jul 23, 2008 at 3:46 PM, nnp <version5@gmail.com> wrote:
Hey,
I'm working on an app at the moment, part of which needs to grab a page from a website and parse the results. It will have to work in environments where connections have to be routed through a SOCKSv4 proxy but I can't find any way to specify a proxy using any of the Twisted classes. Is this possible? I assumed it was seeing as how it is a fairly common situation but the documentation/API haven't provided any useful information.
Cheers, nnp
Btw, my code currently looks like this :
from twisted.web.client import HTTPClientFactory from twisted.internet import ssl from twisted.internet import reactor from twisted.web import client
class BugGetter:
def __init__(self, url): ''' This class attempts to verify that a bug ID is a legitimate '''
self.url = url self.contextFactory = ssl.ClientContextFactory() self.scheme, self.host, self.port, path = client._parse(url)
def getPage(self, bugId): self.bugUrl = ''.join([self.url, bugId]) self.hcf = HTTPClientFactory(self.bugUrl) self.hcf.deferred.addCallback(self.parsePage) self.hcf.deferred.addErrback(self.errorCallback)
if scheme == 'https:': reactor.connectSSL(self.host, self.port, self.hcf, self.contextFactory) else: reactor.connectTCP(self.host, self.port, self.hcf)
reactor.run()
def parsePage(self): print self.hcf.status print self.hcf.message reactor.stop()
def errorCallback(self, failure): print failure.getErrorMessage() reactor.stop()
bz = Bugzilla('https://bugs.example.org/show_bug.cgi?id=') bz.getPage('9999')
-- http://www.smashthestack.org http://www.unprotectedhex.com
-- http://www.smashthestack.org http://www.unprotectedhex.com

On Wed, Jul 23, 2008 at 04:07:20PM +0100, nnp wrote:
I just realised web proxy support would be much preferable to SOCKSv4 (long day, very tired)...so...to alter my question....does twisted have support for proxying connections through a web proxy?
I believe Twisted doesn't support HTTP proxies of any kind. There is support for SOCKSv4 in twisted.protocols.socks, but it's not immediately evident to me how it works. It seems to me that in an ideal world, there would be some kind of Proxy class available that would look much like a reactor to the outside world, but contain all the logic for reading proxy settings from the OS and making outgoing connections via those configured proxies. One day, if I get far enough down my list of potential weekend projects, I might even do it myself.

Hrm, OK....I'm not sure if I have an odd view of the world or whether supporting HTTP proxies isn't all that important. Aren't HTTP proxies fairly common in large organisations and universities? Looks like I won't be able to use Twisted for this particular project then. In case anyone is wondering, I solved the problem (of needing support for access to https sites via a proxy) by using the code found here http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195 As for the socks proxy support....I looked at that and a few other modules but (unsurprisingly) found no real documentation and seeing as how I'm new to Twisted I also had no clue how to use the classes. Cheers, nnp On Thu, Jul 24, 2008 at 11:23 AM, Tim Allen <screwtape@froup.com> wrote:
On Wed, Jul 23, 2008 at 04:07:20PM +0100, nnp wrote:
I just realised web proxy support would be much preferable to SOCKSv4 (long day, very tired)...so...to alter my question....does twisted have support for proxying connections through a web proxy?
I believe Twisted doesn't support HTTP proxies of any kind. There is support for SOCKSv4 in twisted.protocols.socks, but it's not immediately evident to me how it works.
It seems to me that in an ideal world, there would be some kind of Proxy class available that would look much like a reactor to the outside world, but contain all the logic for reading proxy settings from the OS and making outgoing connections via those configured proxies. One day, if I get far enough down my list of potential weekend projects, I might even do it myself.
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
-- http://www.smashthestack.org http://www.unprotectedhex.com

On Thu, Jul 24, 2008 at 11:49:56AM +0100, nnp wrote:
Hrm, OK....I'm not sure if I have an odd view of the world or whether supporting HTTP proxies isn't all that important. Aren't HTTP proxies fairly common in large organisations and universities?
HTTP proxies are very common; I guess nobody who's needed one has yet had time to sit down and write support.

On Fri, Jul 25, 2008 at 04:05:30PM +1000, Tim Allen wrote:
On Thu, Jul 24, 2008 at 11:49:56AM +0100, nnp wrote:
Hrm, OK....I'm not sure if I have an odd view of the world or whether supporting HTTP proxies isn't all that important. Aren't HTTP proxies fairly common in large organisations and universities?
HTTP proxies are very common; I guess nobody who's needed one has yet had time to sit down and write support.
It's not hard: #!/usr/bin/python from twisted.internet import reactor from twisted.web import client def got(page): print "got", repr(page) reactor.callLater(0.1, reactor.stop) def failed(err): print err.getErrorMessage() reactor.callLater(0.1, reactor.stop) class ProxyClientFactory(client.HTTPClientFactory): def setURL(self, url): # do the normal stuff client.HTTPClientFactory.setURL(self, url) # then re-set the path to be the full url self.path = url cf = ProxyClientFactory('http://www.google.com/') cf.deferred.addCallbacks(got, failed) reactor.connectTCP('wwwcache1.ic.ac.uk', 3128, cf)

"nnp" <version5@gmail.com> wrote:
Hrm, OK....I'm not sure if I have an odd view of the world or whether supporting HTTP proxies isn't all that important. Aren't HTTP proxies fairly common in large organisations and universities?
Yeah, but, except for when authentication is required, most of them are set up to be transparent and hence don't really need support. I do realize that there are authenticating http proxies, I have one set up at work, but I guess it's not common enough yet to garner support. -- James Tanis Technical Coordinator Monsignor Donovan Catholic High School e: jtanis@mdchs.org
participants (4)
-
James Tanis
-
nnp
-
Phil Mayers
-
Tim Allen