[Twisted-Python] getPage using ssl proxy
data:image/s3,"s3://crabby-images/fd4e0/fd4e03462c171745cde01fa4859c26e76fa87fec" alt=""
Hello, I am writing some scraper scripts and need to pass them through an intercepting proxy. getPage does not support a proxy argument and this code I found on internet won't work with SSL proxy (stalls indefinitely): def getPage(url, contextFactory=None, *args, **kwargs): scheme, host, port, path = _parse(url) factory = HTTPClientFactory(url, *args, **kwargs) if 0: # use a proxy host, port = 'localhost', 8080 factory.path = url if scheme == 'https': from twisted.internet import ssl if contextFactory is None: contextFactory = ssl.ClientContextFactory() reactor.connectSSL(host, port, factory, contextFactory) else: reactor.connectTCP(host, port, factory) return factory.deferred Plain http proxying works. My guess is that there is an issue with self-signed or otherwise invalid certificate the http proxy supplies. Any clues? -- Konrads Smelkovs Applied IT sorcery.
data:image/s3,"s3://crabby-images/fd4e0/fd4e03462c171745cde01fa4859c26e76fa87fec" alt=""
I found answer to my own question: class NoVerifyClientContextFactory: """A context factory for SSL clients.""" isClient = 1 method = SSL.SSLv3_METHOD def getContext(self): def x(*args): return True ctx=SSL.Context(self.method) #print dir(ctx) ctx.set_verify(SSL.VERIFY_NONE,x) return ctx def getPage(url, contextFactory=None, *args, **kwargs): scheme, host, port, path = _parse(url) factory = HTTPClientFactory(url, *args, **kwargs) if 1: # use a proxy host, port = 'localhost', 8080 factory.path = url if scheme == 'https': from twisted.internet import ssl if contextFactory is None: contextFactory = NoVerifyClientContextFactory() reactor.connectSSL(host, port, factory, contextFactory) else: reactor.connectTCP(host, port, factory) return factory.deferred -- Konrads Smelkovs Applied IT sorcery. On Thu, Jul 30, 2009 at 10:15 PM, Konrads Smelkovs <konrads@smelkovs.com>wrote:
Hello,
I am writing some scraper scripts and need to pass them through an intercepting proxy. getPage does not support a proxy argument and this code I found on internet won't work with SSL proxy (stalls indefinitely):
def getPage(url, contextFactory=None, *args, **kwargs): scheme, host, port, path = _parse(url) factory = HTTPClientFactory(url, *args, **kwargs) if 0: # use a proxy host, port = 'localhost', 8080 factory.path = url if scheme == 'https': from twisted.internet import ssl if contextFactory is None: contextFactory = ssl.ClientContextFactory() reactor.connectSSL(host, port, factory, contextFactory) else: reactor.connectTCP(host, port, factory) return factory.deferred
Plain http proxying works. My guess is that there is an issue with self-signed or otherwise invalid certificate the http proxy supplies. Any clues?
-- Konrads Smelkovs Applied IT sorcery.
participants (1)
-
Konrads Smelkovs